API Usage Guide
The ManyLatents programmatic API enables workflow integration and in-memory data chaining.
Quick Start
from manylatents.api import run
# Single algorithm run
result = run(
data='swissroll',
algorithms={'latent': {'_target_': 'manylatents.algorithms.latent.pca.PCAModule', 'n_components': 10}}
)
embeddings = result['embeddings'] # numpy array
scores = result['scores'] # dict of metrics
Chained Workflows
Chain multiple algorithms by passing the output of one as input to another:
from manylatents.api import run
# Step 1: Initial dimensionality reduction
result1 = run(
data='swissroll',
algorithms={'latent': {'_target_': 'manylatents.algorithms.latent.pca.PCAModule', 'n_components': 50}}
)
# Step 2: Chain to another algorithm
result2 = run(
input_data=result1['embeddings'],
algorithms={'latent': {'_target_': 'manylatents.algorithms.latent.umap.UMAPModule', 'n_components': 2}}
)
final_embeddings = result2['embeddings']
Note: Embeddings are automatically converted to numpy arrays by the evaluation system.
Available Algorithms
Dimensionality Reduction
- PCA:
manylatents.algorithms.latent.pca.PCAModule - t-SNE:
manylatents.algorithms.latent.tsne.TSNEModule - UMAP:
manylatents.algorithms.latent.umap.UMAPModule - PHATE:
manylatents.algorithms.latent.phate.PHATEModule
API Reference
run(input_data=None, **overrides)
Execute a dimensionality reduction algorithm.
Parameters:
input_data(np.ndarray, optional): In-memory data array of shape(n_samples, n_features). If provided, this data is used instead of loading from a dataset.**overrides: Configuration overrides (e.g.,data='swissroll',algorithms={...})
Returns:
Dictionary with keys:
- embeddings: Computed embeddings (numpy array)
- label: Labels from dataset (if available)
- metadata: Run metadata dictionary
- scores: Evaluation metrics (if enabled)
Examples:
# Using a built-in dataset
result = run(data='swissroll', algorithms={'latent': 'pca'})
# Using in-memory data
import numpy as np
my_data = np.random.randn(1000, 100).astype(np.float32)
result = run(input_data=my_data, algorithms={'latent': 'pca'})
# Chaining algorithms
result2 = run(input_data=result['embeddings'], algorithms={'latent': 'umap'})
Advanced Usage
Custom Configuration
result = run(
data='swissroll',
algorithms={
'latent': {
'_target_': 'manylatents.algorithms.latent.umap.UMAPModule',
'n_components': 2,
'n_neighbors': 15,
'min_dist': 0.1
}
}
)
Disabling Features for Speed
# Disable W&B logging
result = run(data='swissroll', algorithms={'latent': 'pca'}, debug=True)
# Skip evaluation metrics
result = run(data='swissroll', algorithms={'latent': 'pca'}, metrics=None)
Data Format Requirements
- Input: numpy.ndarray with dtype
float32orfloat64 - Shape:
(n_samples, n_features) - Output: numpy array (tensor conversion is handled automatically)
Multi-Step Example
from manylatents.api import run
# Progressive dimensionality reduction
steps = [
('PCA 100D', 'manylatents.algorithms.latent.pca.PCAModule', 100),
('PCA 50D', 'manylatents.algorithms.latent.pca.PCAModule', 50),
('UMAP 2D', 'manylatents.algorithms.latent.umap.UMAPModule', 2),
]
# Initial data
current_data = run(
data='swissroll',
algorithms={'latent': {'_target_': steps[0][1], 'n_components': steps[0][2]}}
)
# Chain subsequent steps
for name, target, n_comp in steps[1:]:
print(f"Running {name}...")
current_data = run(
input_data=current_data['embeddings'],
algorithms={'latent': {'_target_': target, 'n_components': n_comp}}
)
print(f"Final shape: {current_data['embeddings'].shape}")
Implementation Details
The in-memory data pipeline uses:
- PrecomputedDataModule: Accepts data parameter for numpy arrays
- InMemoryDataset: Wraps arrays in LatentOutputs format
- Compatible with all ManyLatents metrics, callbacks, and visualizations
Troubleshooting
Common Issues
"PrecomputedDataModule requires either a 'path' or 'data' argument"
Provide either data='dataset_name' OR input_data=array, not both or neither.