Skip to content

Algorithms

manyLatents provides two algorithm base classes. The decision rule is binary: if the algorithm trains with backprop, use LightningModule. If not, use LatentModule.

fit/transform Algorithms

LatentModule is the base class for non-neural algorithms. Subclass it, implement fit() and transform(), and you're done.

from manylatents.algorithms.latent.latent_module_base import LatentModule

class MyAlgorithm(LatentModule):
    def __init__(self, n_components=2, my_param=1.0, **kwargs):
        super().__init__(n_components=n_components, **kwargs)
        self.my_param = my_param

    def fit(self, x, y=None):
        self._is_fitted = True

    def transform(self, x):
        return x[:, :self.n_components]

Available Algorithms

| algorithm | type | config | key params |

|---|---|---|---| | ArchetypalAnalysisModule | latent | algorithms/latent=aa | method, max_iter | | ClassifierModule | latent | algorithms/latent=classifier | model, max_iter, init_seed | | DiffusionMapModule | latent | algorithms/latent=diffusionmap | t, knn, decay, n_pca | | LeidenModule | latent | algorithms/latent=leiden | resolution, n_neighbors, backend, device | | MDSModule | latent | algorithms/latent=mds | ndim, seed, how, solver | | MultiscalePHATEModule | latent | algorithms/latent=multiscale_phate | scale, granularity, landmarks, knn | | NoOpModule | latent | algorithms/latent=noop | -- | | PCAModule | latent | algorithms/latent=pca | -- | | PHATEModule | latent | algorithms/latent=phate | knn, t, gamma, decay | | PHATEModule | latent | algorithms/latent=phate_torchdr | knn, t, gamma, decay | | ReebGraphModule | latent | algorithms/latent=reeb_graph | n_bins, overlap, lens, backend | | ReebGraphModule | latent | algorithms/latent=reeb_graph_density | n_bins, overlap, lens, lens_k | | ReebGraphModule | latent | algorithms/latent=reeb_graph_diffusion1 | n_bins, overlap, lens, lens_k | | ReebGraphModule | latent | algorithms/latent=reeb_graph_pca1 | n_bins, overlap, lens, backend | | DiffusionMapModule | latent | algorithms/latent=spectral_clustering | t, knn, decay, n_pca | | TSNEModule | latent | algorithms/latent=tsne | perplexity, n_iter_early, n_iter_late, learning_rate | | TSNEModule | latent | algorithms/latent=tsne_torchdr | perplexity, fit_fraction, backend, device | | UMAPModule | latent | algorithms/latent=umap | n_neighbors, min_dist, n_epochs, metric | | UMAPModule | latent | algorithms/latent=umap_torchdr | n_neighbors, min_dist, fit_fraction, backend |

### FoundationEncoder Pattern

Frozen pretrained models also use LatentModule — `fit()` is a no-op and `transform()` wraps the model's forward pass. This is a usage convention, not a separate class. Implementations live in `manylatents-omics/manylatents/dogma/encoders/` (Evo2, ESM3, Orthrus, AlphaGenome).

### Optional Methods

If your algorithm uses a kernel-based approach, implement these to enable module-level metrics:

```python
def kernel_matrix(self, ignore_diagonal=False) -> np.ndarray:
    """Raw similarity matrix (N x N)."""
    ...

def affinity_matrix(self, ignore_diagonal=False, use_symmetric=False) -> np.ndarray:
    """Normalized transition matrix."""
    ...
```

This enables metrics like `KernelMatrixSparsity`, `AffinitySpectrum`, and `ConnectedComponents`.

### Running

```bash
uv run python -m manylatents.main algorithms/latent=pca data=swissroll
```

### Adding a New LatentModule

1. Create `manylatents/algorithms/latent/your_algo.py` inheriting from `LatentModule`
2. Implement `fit(x, y=None)` and `transform(x)`
3. Create `manylatents/configs/algorithms/latent/your_algo.yaml` with `_target_`
4. Import in `manylatents/algorithms/latent/__init__.py`

Trainable Algorithms

Neural network algorithms use PyTorch Lightning's LightningModule with training loops. Implement setup(), training_step(), encode(), and configure_optimizers().

Available Algorithms

| algorithm | type | config | key params |

|---|---|---|---| | Reconstruction | lightning | algorithms/lightning=aanet_reconstruction | datamodule, network, optimizer, loss | | Reconstruction | lightning | algorithms/lightning=ae_reconstruction | datamodule, network, optimizer, loss | | HFTrainerModule | lightning | algorithms/lightning=hf_trainer | config | | LatentODE | lightning | algorithms/lightning=latent_ode | datamodule, network, optimizer, loss |

### Pattern

All LightningModule algorithms follow the same pattern:

```python
class MyTrainableAlgorithm(LightningModule):
    def __init__(self, network, optimizer, loss, datamodule=None, init_seed=42):
        super().__init__()
        self.save_hyperparameters(ignore=["datamodule", "network", "loss"])
        self.network_config = network
        self.optimizer_config = optimizer
        self.loss_fn = loss

    def setup(self, stage=None):
        # Deferred network construction — input_dim from datamodule
        input_dim = self.trainer.datamodule.data_dim
        self.network = hydra.utils.instantiate(self.network_config, input_dim=input_dim)

    def training_step(self, batch, batch_idx):
        outputs = self.network(batch)
        return self.loss_fn(outputs, batch)

    def encode(self, x):
        return self.network.encode(x)

    def configure_optimizers(self):
        return hydra.utils.instantiate(self.optimizer_config, params=self.parameters())
```

Key conventions:

- `save_hyperparameters(ignore=["datamodule", "network", "loss"])` — Lightning can't serialize nn.Modules
- `setup()` defers network construction until `input_dim` is known from the datamodule
- `encode()` extracts embeddings for evaluation after training
- Use the project's `MSELoss` from `manylatents.algorithms.lightning.losses.mse`, not `torch.nn.MSELoss`

### Latent ODE

The `LatentODE` algorithm integrates neural ODEs for learning continuous-time dynamics in latent space:

```bash
uv run python -m manylatents.main \
  algorithms/lightning=latent_ode \
  data=swissroll \
  trainer.max_epochs=10
```

Configuration supports custom integration times and ODE solver options via `torchdiffeq`.

### Running

```bash
uv run python -m manylatents.main \
  algorithms/lightning=ae_reconstruction \
  data=swissroll \
  trainer.max_epochs=10

# Fast dev run for testing
uv run python -m manylatents.main \
  algorithms/lightning=ae_reconstruction \
  data=swissroll \
  trainer.fast_dev_run=true
```

### Adding a New LightningModule

1. Create `manylatents/algorithms/lightning/your_algo.py` inheriting from `LightningModule`
2. Implement `setup()`, `training_step()`, `encode()`, `configure_optimizers()`
3. Use `self.save_hyperparameters(ignore=["datamodule", "network", "loss"])`
4. Create config in `manylatents/configs/algorithms/lightning/your_algo.yaml`
5. Test with `trainer.fast_dev_run=true`

Network Architectures

Networks are nn.Module classes used by LightningModule algorithms. They define the architecture; the LightningModule wraps the training logic.

Available Networks

Network Class Config Description
Autoencoder Autoencoder algorithms/lightning/network=autoencoder Symmetric encoder-decoder with configurable layers
AANet AANet algorithms/lightning/network=aanet Archetypal analysis network
LatentODENetwork LatentODENetwork (configured via latent_ode.yaml) ODE function for continuous dynamics

Autoencoder Config

# configs/algorithms/lightning/network/autoencoder.yaml
_target_: manylatents.algorithms.lightning.networks.autoencoder.Autoencoder
input_dim: ???  # Set by setup() from datamodule
hidden_dims: [128, 64]
latent_dim: 2
activation: relu

Loss Functions

Use the project's loss functions, not PyTorch's directly.

Loss Class Config Description
MSELoss MSELoss algorithms/lightning/loss=default Reconstruction loss
GeometricLoss GeometricLoss algorithms/lightning/loss=ae_dim Dimensionality-aware loss
GeometricLoss GeometricLoss algorithms/lightning/loss=ae_neighbors Neighborhood-preserving loss
GeometricLoss GeometricLoss algorithms/lightning/loss=ae_shape Shape-preserving loss

The project's MSELoss (from manylatents.algorithms.lightning.losses.mse) accepts (outputs, targets, **kwargs), unlike torch.nn.MSELoss. Always use the project's version.

Optimizer Config

# configs/algorithms/lightning/optimizer/adam.yaml
_target_: torch.optim.Adam
_partial_: true
lr: 0.001

The _partial_: true flag creates a partial that receives params= from configure_optimizers().

Composing a Full Config

# configs/algorithms/lightning/ae_reconstruction.yaml
_target_: manylatents.algorithms.lightning.reconstruction.Reconstruction
_recursive_: false
datamodule: ${data}
network:
  _target_: manylatents.algorithms.lightning.networks.autoencoder.Autoencoder
  input_dim: null  # inferred from data in setup()
  hidden_dims: [512, 256, 128]
  latent_dim: 50
  activation: relu
  batchnorm: true
  dropout: 0.1
optimizer:
  _target_: torch.optim.Adam
  _partial_: true
  lr: 0.001
loss:
  _target_: manylatents.algorithms.lightning.losses.mse.MSELoss

_recursive_: false prevents Hydra from eagerly instantiating nested configs — the Reconstruction module handles deferred instantiation in setup() once input_dim is known from the datamodule.