Contributing to manyLatents
Guidelines for adding new components and ensuring they integrate correctly with the framework.
Testing Philosophy
All contributions must pass automated tests on every pull request. CI validates:
- Unit tests —
pytest tests/runs the full test suite - CLI smoke tests — default LatentModule and LightningModule paths work
- Component discovery — if your PR touches algorithms, metrics, or data configs, CI auto-discovers all configs in that group and runs each one end-to-end
- Docs build —
mkdocs build --strictpasses
How CI detects what to test
CI uses path-based filtering. If your PR modifies files under a component directory, the corresponding discovery test runs automatically:
| Files changed | CI runs |
|---|---|
manylatents/algorithms/latent/** or configs/algorithms/latent/** |
Discovers and smoke-tests every LatentModule config |
manylatents/algorithms/lightning/** or configs/algorithms/lightning/** |
Discovers and smoke-tests every LightningModule config |
manylatents/metrics/** or configs/metrics/** |
Discovers and smoke-tests every metric config |
This means: if you add a new config YAML and its _target_ doesn't resolve, or the algorithm crashes on synthetic data, CI will catch it.
Adding New Metrics
Every new metric follows 4 steps: wrapper → register → config → smoke test.
Step 1: Write the Wrapper
Follow the Metric protocol — embeddings first, then dataset, module, your params, and cache:
# manylatents/metrics/your_metric.py
from typing import Optional
import numpy as np
from manylatents.metrics.registry import register_metric
from manylatents.utils.metrics import compute_knn
@register_metric(
aliases=["your_metric"],
default_params={"k": 25},
description="Short description of what this metric measures",
)
def YourMetric(
embeddings: np.ndarray,
dataset=None,
module=None,
k: int = 25,
cache: Optional[dict] = None,
) -> float:
# Use compute_knn with cache for shared kNN computation
dists, indices = compute_knn(embeddings, k=k, cache=cache)
score = ... # your computation
return float(score)
Return types: float, tuple[float, np.ndarray] (scalar + per-sample), or dict[str, Any] (structured).
Evaluation context determines the config directory:
- Only needs original data? →
metrics/dataset/ - Compares original vs. reduced? →
metrics/embedding/ - Needs algorithm internals (graph, kernel)? →
metrics/module/
Step 2: Register It
The @register_metric decorator (shown above) adds your metric to the dynamic registry with aliases, default params, and a description. This powers list_metrics(), auto-generated docs tables, and programmatic discovery.
Import your metric in manylatents/metrics/__init__.py so the decorator fires at import time.
Step 3: Create the Config
# manylatents/configs/metrics/embedding/your_metric.yaml
your_metric:
_target_: manylatents.metrics.your_metric.YourMetric
_partial_: True
k: 25
Configs are nested under the metric name with _partial_: True so Hydra binds the params at config time and the engine calls it with embeddings, dataset, module, and cache at runtime.
Step 4: E2E Smoke Test
# Verify your metric runs end-to-end
manylatents algorithms/latent=pca data=swissroll \
metrics/embedding=your_metric logger=none
CI will auto-discover your new config and test it if manylatents/metrics/ or configs/metrics/ files are changed.
Adding New Algorithms
LatentModule (non-neural)
For algorithms without a training loop — fit/transform pattern:
# manylatents/algorithms/latent/your_algorithm.py
import numpy as np
from manylatents.algorithms.latent.latent_module_base import LatentModule
class YourAlgorithmModule(LatentModule):
"""Your dimensionality reduction algorithm."""
def __init__(self, n_components: int = 2, **kwargs):
super().__init__()
self.n_components = n_components
def fit(self, X: np.ndarray) -> None:
"""Fit the model on training data."""
pass
def transform(self, X: np.ndarray) -> np.ndarray:
"""Transform data to low-dimensional space."""
return self._compute_embeddings(X)
LightningModule (neural)
For algorithms that train with backprop:
# manylatents/algorithms/lightning/your_network.py
from lightning import LightningModule
class YourNetwork(LightningModule):
def __init__(self, ...):
super().__init__()
self.save_hyperparameters(ignore=["datamodule", "network", "loss"])
def training_step(self, batch, batch_idx):
pass
def encode(self, x):
return embeddings
def configure_optimizers(self):
return ...
Create Config
# manylatents/configs/algorithms/latent/your_algorithm.yaml
_target_: manylatents.algorithms.latent.your_algorithm.YourAlgorithmModule
n_components: 2
Test Locally
# LatentModule
manylatents algorithms/latent=your_algorithm data=swissroll \
metrics=noop logger=none
# LightningModule
manylatents algorithms/lightning=your_network data=swissroll \
trainer.fast_dev_run=true metrics=noop logger=none
CI will auto-discover your new config and test it when you open a PR.
Adding New Datasets
# manylatents/data/your_dataset.py
from typing import Optional
from lightning import LightningDataModule
from torch.utils.data import DataLoader
class YourDataModule(LightningDataModule):
def __init__(self, batch_size: int = 32, **kwargs):
super().__init__()
self.batch_size = batch_size
def setup(self, stage: Optional[str] = None):
self.train_dataset = ...
self.test_dataset = ...
def train_dataloader(self):
return DataLoader(self.train_dataset, batch_size=self.batch_size)
# manylatents/configs/data/your_dataset.yaml
_target_: manylatents.data.your_dataset.YourDataModule
batch_size: 32
manylatents data=your_dataset algorithms/latent=pca \
metrics=noop logger=none
CI Pipeline
Three jobs run on every PR:
test (Python 3.11 + 3.12 matrix):
- pytest tests/ — full unit test suite
- CLI smoke test: LatentModule path (experiment=single_algorithm)
- CLI smoke test: LightningModule path (algorithms/lightning=ae_reconstruction, trainer.fast_dev_run=true)
- If algorithms/latent/ changed → discovers and tests all LatentModule configs
- If algorithms/lightning/ changed → discovers and tests all LightningModule configs
- If metrics/ changed → discovers and tests all metric configs
docs:
- scripts/check_docs_coverage.py — verifies all _target_ paths in configs are importable
- mkdocs build --strict — verifies docs site builds cleanly
publish (tags only): - Builds sdist + wheel, publishes to PyPI via trusted publishing
Local Pre-submission Checklist
# Run unit tests
pytest tests/ -x -q
# Smoke test your component
manylatents algorithms/latent=your_algo data=swissroll metrics=noop logger=none
# Docs build (optional)
mkdocs build --strict
Optional Dependencies
Some features require optional extras. If your contribution uses an optional dependency:
- Add it to the appropriate
[project.optional-dependencies]group inpyproject.toml - Use lazy imports (
try/except ImportError) so the core package doesn't break without it - Add
pytest.importorskip()or@pytest.mark.skipifto tests that need the dep
Current extras: hf (transformers), torchdr (TorchDR + faiss), jax (JAX + diffrax + optax + ott-jax), docs.
Questions?
- Metrics reference
- API usage
- Cache protocol
- Issues: open a GitHub issue