Algorithms
manyLatents provides two algorithm base classes. The decision rule is binary: if the algorithm trains with backprop, use LightningModule. If not, use LatentModule.
fit/transform Algorithms
LatentModule is the base class for non-neural algorithms. Subclass it, implement fit() and transform(), and you're done.
from manylatents.algorithms.latent.latent_module_base import LatentModule
class MyAlgorithm(LatentModule):
def __init__(self, n_components=2, my_param=1.0, **kwargs):
super().__init__(n_components=n_components, **kwargs)
self.my_param = my_param
def fit(self, x, y=None):
self._is_fitted = True
def transform(self, x):
return x[:, :self.n_components]
Available Algorithms
| algorithm | type | config | key params |
|---|---|---|---|
| ArchetypalAnalysisModule | latent | algorithms/latent=aa | method, max_iter |
| ClassifierModule | latent | algorithms/latent=classifier | model, max_iter, init_seed |
| DiffusionMapModule | latent | algorithms/latent=diffusionmap | t, knn, decay, n_pca |
| LeidenModule | latent | algorithms/latent=leiden | resolution, n_neighbors, backend, device |
| MDSModule | latent | algorithms/latent=mds | ndim, seed, how, solver |
| MultiscalePHATEModule | latent | algorithms/latent=multiscale_phate | scale, granularity, landmarks, knn |
| NoOpModule | latent | algorithms/latent=noop | -- |
| PCAModule | latent | algorithms/latent=pca | -- |
| PHATEModule | latent | algorithms/latent=phate | knn, t, gamma, decay |
| PHATEModule | latent | algorithms/latent=phate_torchdr | knn, t, gamma, decay |
| ReebGraphModule | latent | algorithms/latent=reeb_graph | n_bins, overlap, lens, backend |
| ReebGraphModule | latent | algorithms/latent=reeb_graph_density | n_bins, overlap, lens, lens_k |
| ReebGraphModule | latent | algorithms/latent=reeb_graph_diffusion1 | n_bins, overlap, lens, lens_k |
| ReebGraphModule | latent | algorithms/latent=reeb_graph_pca1 | n_bins, overlap, lens, backend |
| DiffusionMapModule | latent | algorithms/latent=spectral_clustering | t, knn, decay, n_pca |
| TSNEModule | latent | algorithms/latent=tsne | perplexity, n_iter_early, n_iter_late, learning_rate |
| TSNEModule | latent | algorithms/latent=tsne_torchdr | perplexity, fit_fraction, backend, device |
| UMAPModule | latent | algorithms/latent=umap | n_neighbors, min_dist, n_epochs, metric |
| UMAPModule | latent | algorithms/latent=umap_torchdr | n_neighbors, min_dist, fit_fraction, backend |
### FoundationEncoder Pattern
Frozen pretrained models also use LatentModule — `fit()` is a no-op and `transform()` wraps the model's forward pass. This is a usage convention, not a separate class. Implementations live in `manylatents-omics/manylatents/dogma/encoders/` (Evo2, ESM3, Orthrus, AlphaGenome).
### Optional Methods
If your algorithm uses a kernel-based approach, implement these to enable module-level metrics:
```python
def kernel_matrix(self, ignore_diagonal=False) -> np.ndarray:
"""Raw similarity matrix (N x N)."""
...
def affinity_matrix(self, ignore_diagonal=False, use_symmetric=False) -> np.ndarray:
"""Normalized transition matrix."""
...
```
This enables metrics like `KernelMatrixSparsity`, `AffinitySpectrum`, and `ConnectedComponents`.
### Running
```bash
uv run python -m manylatents.main algorithms/latent=pca data=swissroll
```
### Adding a New LatentModule
1. Create `manylatents/algorithms/latent/your_algo.py` inheriting from `LatentModule`
2. Implement `fit(x, y=None)` and `transform(x)`
3. Create `manylatents/configs/algorithms/latent/your_algo.yaml` with `_target_`
4. Import in `manylatents/algorithms/latent/__init__.py`
Trainable Algorithms
Neural network algorithms use PyTorch Lightning's LightningModule with training loops. Implement setup(), training_step(), encode(), and configure_optimizers().
Available Algorithms
| algorithm | type | config | key params |
|---|---|---|---|
| Reconstruction | lightning | algorithms/lightning=aanet_reconstruction | datamodule, network, optimizer, loss |
| Reconstruction | lightning | algorithms/lightning=ae_reconstruction | datamodule, network, optimizer, loss |
| HFTrainerModule | lightning | algorithms/lightning=hf_trainer | config |
| LatentODE | lightning | algorithms/lightning=latent_ode | datamodule, network, optimizer, loss |
### Pattern
All LightningModule algorithms follow the same pattern:
```python
class MyTrainableAlgorithm(LightningModule):
def __init__(self, network, optimizer, loss, datamodule=None, init_seed=42):
super().__init__()
self.save_hyperparameters(ignore=["datamodule", "network", "loss"])
self.network_config = network
self.optimizer_config = optimizer
self.loss_fn = loss
def setup(self, stage=None):
# Deferred network construction — input_dim from datamodule
input_dim = self.trainer.datamodule.data_dim
self.network = hydra.utils.instantiate(self.network_config, input_dim=input_dim)
def training_step(self, batch, batch_idx):
outputs = self.network(batch)
return self.loss_fn(outputs, batch)
def encode(self, x):
return self.network.encode(x)
def configure_optimizers(self):
return hydra.utils.instantiate(self.optimizer_config, params=self.parameters())
```
Key conventions:
- `save_hyperparameters(ignore=["datamodule", "network", "loss"])` — Lightning can't serialize nn.Modules
- `setup()` defers network construction until `input_dim` is known from the datamodule
- `encode()` extracts embeddings for evaluation after training
- Use the project's `MSELoss` from `manylatents.algorithms.lightning.losses.mse`, not `torch.nn.MSELoss`
### Latent ODE
The `LatentODE` algorithm integrates neural ODEs for learning continuous-time dynamics in latent space:
```bash
uv run python -m manylatents.main \
algorithms/lightning=latent_ode \
data=swissroll \
trainer.max_epochs=10
```
Configuration supports custom integration times and ODE solver options via `torchdiffeq`.
### Running
```bash
uv run python -m manylatents.main \
algorithms/lightning=ae_reconstruction \
data=swissroll \
trainer.max_epochs=10
# Fast dev run for testing
uv run python -m manylatents.main \
algorithms/lightning=ae_reconstruction \
data=swissroll \
trainer.fast_dev_run=true
```
### Adding a New LightningModule
1. Create `manylatents/algorithms/lightning/your_algo.py` inheriting from `LightningModule`
2. Implement `setup()`, `training_step()`, `encode()`, `configure_optimizers()`
3. Use `self.save_hyperparameters(ignore=["datamodule", "network", "loss"])`
4. Create config in `manylatents/configs/algorithms/lightning/your_algo.yaml`
5. Test with `trainer.fast_dev_run=true`
Network Architectures
Networks are nn.Module classes used by LightningModule algorithms. They define the architecture; the LightningModule wraps the training logic.
Available Networks
| Network | Class | Config | Description |
|---|---|---|---|
| Autoencoder | Autoencoder |
algorithms/lightning/network=autoencoder |
Symmetric encoder-decoder with configurable layers |
| AANet | AANet |
algorithms/lightning/network=aanet |
Archetypal analysis network |
| LatentODENetwork | LatentODENetwork |
(configured via latent_ode.yaml) |
ODE function for continuous dynamics |
Autoencoder Config
# configs/algorithms/lightning/network/autoencoder.yaml
_target_: manylatents.algorithms.lightning.networks.autoencoder.Autoencoder
input_dim: ??? # Set by setup() from datamodule
hidden_dims: [128, 64]
latent_dim: 2
activation: relu
Loss Functions
Use the project's loss functions, not PyTorch's directly.
| Loss | Class | Config | Description |
|---|---|---|---|
| MSELoss | MSELoss |
algorithms/lightning/loss=default |
Reconstruction loss |
| GeometricLoss | GeometricLoss |
algorithms/lightning/loss=ae_dim |
Dimensionality-aware loss |
| GeometricLoss | GeometricLoss |
algorithms/lightning/loss=ae_neighbors |
Neighborhood-preserving loss |
| GeometricLoss | GeometricLoss |
algorithms/lightning/loss=ae_shape |
Shape-preserving loss |
The project's MSELoss (from manylatents.algorithms.lightning.losses.mse) accepts (outputs, targets, **kwargs), unlike torch.nn.MSELoss. Always use the project's version.
Optimizer Config
# configs/algorithms/lightning/optimizer/adam.yaml
_target_: torch.optim.Adam
_partial_: true
lr: 0.001
The _partial_: true flag creates a partial that receives params= from configure_optimizers().
Composing a Full Config
# configs/algorithms/lightning/ae_reconstruction.yaml
_target_: manylatents.algorithms.lightning.reconstruction.Reconstruction
_recursive_: false
datamodule: ${data}
network:
_target_: manylatents.algorithms.lightning.networks.autoencoder.Autoencoder
input_dim: null # inferred from data in setup()
hidden_dims: [512, 256, 128]
latent_dim: 50
activation: relu
batchnorm: true
dropout: 0.1
optimizer:
_target_: torch.optim.Adam
_partial_: true
lr: 0.001
loss:
_target_: manylatents.algorithms.lightning.losses.mse.MSELoss
_recursive_: false prevents Hydra from eagerly instantiating nested configs — the Reconstruction module handles deferred instantiation in setup() once input_dim is known from the datamodule.