Extensions
manyLatents uses a modular extension system that allows domain-specific functionality to be installed as separate packages. Extensions integrate seamlessly through Python's namespace package system and Hydra's config composition.
Available Extensions
manylatents-omics
The genomics, population genetics, and single-cell extension for manyLatents.
Repository: github.com/latent-reasoning-works/manylatents-omics
Adds three submodules:
manylatents.dogma— Foundation model encoders (Evo2, ESM3, Orthrus, AlphaGenome) and fusion algorithmsmanylatents.popgen— Population genetics data modules and metrics (GeographicPreservation, AdmixturePreservation)manylatents.singlecell— Single-cell AnnData data modules
Installing
Quick Install (Recommended)
uv add git+https://github.com/latent-reasoning-works/manylatents-omics.git
Using Git Submodules
For contributors working on both core and extensions:
git submodule add https://github.com/latent-reasoning-works/manylatents-omics.git extensions/manylatents-omics
uv add git+file://extensions/manylatents-omics
Development Workflow
Working FROM the manylatents-omics repo (recommended for omics development):
cd manylatents-omics
uv sync # Pulls manylatents from git automatically
uv run python -m manylatents.main experiment=single_algorithm
Working FROM the manylatents repo (core development only):
cd manylatents
uv sync
uv run python -m manylatents.main experiment=single_algorithm
Extensions are auto-discovered via entry-points. Use manylatents.main regardless of which repo you're working from.
Using Extensions in Code
Once installed, extension features are available through the manylatents namespace:
# Core imports (always available)
from manylatents.data import SwissRoll
from manylatents.algorithms.latent import PCAModule
# Extension imports (available when manylatents-omics is installed)
from manylatents.popgen.data import HGDPDataset
from manylatents.popgen.metrics import GeographicPreservation
from manylatents.dogma.encoders import Evo2Encoder
from manylatents.singlecell.data import AnnDataModule
Using Extensions with Hydra
Extensions integrate directly with Hydra configs:
uv run python -m manylatents.main \
data=hgdp_1kgp \
algorithms/latent=pca \
metrics=genetic_metrics \
logger=wandb
Checking What's Installed
import pkgutil
import manylatents
for importer, modname, ispkg in pkgutil.iter_modules(manylatents.__path__):
print(f"- {modname}")
Troubleshooting
Hydra Config Discovery Error
Problem: ConfigAttributeError: Key 'experiment' is not in struct
Cause: Hydra SearchPathPlugin not being discovered.
Solutions:
- Ensure the extension is installed (not just cloned):
uv add git+https://github.com/latent-reasoning-works/manylatents-omics.git - If developing both packages, work from the omics repo
- Verify plugin discovery:
from hydra.core.plugins import Plugins from hydra.plugins.search_path_plugin import SearchPathPlugin plugins = list(Plugins.instance().discover(SearchPathPlugin)) print([p.__name__ for p in plugins])
Extension Not Found
Problem: ModuleNotFoundError: No module named 'manylatents.omics'
Solution: Install the extension:
uv add git+https://github.com/latent-reasoning-works/manylatents-omics.git
Import Conflicts
Problem: Namespace package not merging correctly.
Solution: Ensure both packages have the namespace declaration in manylatents/__init__.py:
__path__ = __import__('pkgutil').extend_path(__path__, __name__)
Design Philosophy
manyLatents is built around a simple idea: every interface between stages is a file with a known schema. This matters because the agents and scripts that compose manyLatents into larger workflows are stateless — they don't remember what happened in the last call. If the output of one step doesn't fully describe itself, the next step can't use it.
This constraint shapes everything:
- LatentOutputs is a
dict[str, Any], not a dataclass. When a new metric injects a custom field, every downstream consumer still works without schema migration. - Metrics are registered via Hydra configs with
_target_and_partial_: True. Parameters are bound at config time, not at call time, so the evaluation engine doesn't need to know what parameters each metric takes. - Algorithms are either
LatentModule(fit/transform) orLightningModulesubclasses (training loops). The execution engine dispatches on type, not on name.
The result is a system where you can add a new algorithm, metric, dataset, or entire domain extension without touching core code.
Two Execution Modes
CLI (uv run python -m manylatents.main) executes a single step: one algorithm + metrics on one dataset. This is the primary user-facing interface and what SLURM jobs invoke.
Python API (manylatents.api.run()) is the programmatic interface designed for agent-driven workflows. It accepts input_data to chain the output of one call into the next, and supports pipeline configs for sequential steps within a single process.
from manylatents.api import run
# Single step
result = run(
data='swissroll',
algorithms={'latent': {
'_target_': 'manylatents.algorithms.latent.pca.PCAModule',
'n_components': 50
}}
)
# Chaining: feed output of one step into the next
result2 = run(
input_data=result['embeddings'],
algorithms={'latent': {
'_target_': 'manylatents.algorithms.latent.phate.PHATEModule',
'n_components': 2
}}
)
Namespace Extension via pkgutil
The core package's __init__.py contains one line:
__path__ = __import__('pkgutil').extend_path(__path__, __name__)
This tells Python: "if another installed package also defines a manylatents directory, merge its contents into mine." The rule is simple: core never imports from extensions; extensions import from core.
Extensions also register a Hydra SearchPathPlugin so their configs are discovered automatically:
class OmicsSearchPathPlugin(SearchPathPlugin):
def manipulate_search_path(self, search_path):
search_path.append(
provider="manylatents-omics",
path="pkg://manylatents.dogma.configs",
)
Four Extension Axes
1. Algorithms
Two base classes, binary decision rule:
LatentModule— fit/transform for non-neural algorithms (PCA, UMAP, PHATE, etc.). The FoundationEncoder pattern is a LatentModule wherefit()is a no-op andtransform()wraps a pretrained model.LightningModulesubclasses — trainable neural networks with Lightning training loops (autoencoders, Latent ODEs).
Optional methods kernel_matrix() and affinity_matrix() enable module-level metrics like KernelMatrixSparsity and AffinitySpectrum.
2. Metrics
Metrics follow the Metric protocol with three evaluation contexts:
| Context | embeddings |
dataset |
module |
Use case |
|---|---|---|---|---|
embedding |
Low-dim output | Source dataset | - | Trustworthiness, continuity |
dataset |
- | Source dataset | - | Stratification, admixture |
module |
- | Source dataset | Fitted LatentModule | Affinity spectrum, kernel sparsity |
List-valued parameters in configs expand via Cartesian product through flatten_and_unroll_metrics(). Metrics sharing kNN graphs use a shared cache.
3. Data Modules
Data modules provide get_data() and are auto-discovered at import time. Synthetic datasets generate on-the-fly; file-based datasets load from disk. For LightningModule algorithms, they also implement LightningDataModule.
4. Domain Extensions
A domain extension is a separate installable package that adds algorithms, metrics, and data modules to the manylatents namespace. See the Development tab for how to create one.
Hydra Configuration
Every extensible component has a corresponding Hydra config group:
configs/
algorithms/
latent/ # LatentModule configs
lightning/ # LightningModule configs
loss/ # Loss function configs
network/ # Network architecture configs
optimizer/ # Optimizer configs
data/ # Dataset configs
metrics/
embedding/ # Embedding-level metric configs
dataset/ # Dataset-level metric configs
module/ # Module-level metric configs
sampling/ # Metric sampling strategies
callbacks/embedding/
experiment/ # Experiment preset configs
trainer/ # Lightning trainer configs
logger/ # Logger configs (none, wandb)
cluster/ # SLURM cluster configs (via Shop)
launcher/ # Job launcher configs (via Shop)
Scope Boundaries
manyLatents owns single-step execution and the Python API for composable workflows. It does NOT own:
- Multi-step orchestration — manyAgents calls
manylatents.api.run()to compose steps - RL / reward-driven training — Geomancer
- Cluster job submission — Shop provides Hydra launcher plugins
Creating an Extension
This guide documents how to create extension packages for manyLatents, following the patterns established by manylatents-omics.
Architecture Overview
┌─────────────────────────────────────────────────────────────┐
│ Your Application │
├─────────────────────────────────────────────────────────────┤
│ manylatents-yourextension │ manylatents-omics │
│ (your namespace package) │ (popgen, dogma, singlecell)│
├─────────────────────────────────────────────────────────────┤
│ shop (optional) │
│ (shared SLURM launchers, logging utils) │
├─────────────────────────────────────────────────────────────┤
│ manylatents (core) │
│ (LatentModule, metrics, data, experiment runner) │
└─────────────────────────────────────────────────────────────┘
Package Structure
manylatents-yourextension/
├── pyproject.toml
├── README.md
├── CLAUDE.md # AI assistant instructions
├── manylatents/
│ ├── __init__.py # Namespace package declaration (CRITICAL)
│ ├── yourext_plugin.py # Hydra SearchPathPlugin
│ └── yourext/
│ ├── __init__.py
│ ├── algorithms/
│ │ ├── __init__.py
│ │ └── your_algorithm.py
│ ├── data/
│ │ ├── __init__.py
│ │ └── your_dataset.py
│ ├── metrics/
│ │ ├── __init__.py
│ │ └── your_metric.py
│ └── configs/
│ ├── __init__.py # Empty, required for pkg://
│ ├── data/
│ │ └── your_data.yaml
│ ├── algorithms/
│ │ └── latent/
│ │ └── your_algo.yaml
│ ├── metrics/
│ │ └── dataset/
│ │ └── your_metric.yaml
│ └── experiment/
│ └── your_experiment.yaml
└── tests/
├── __init__.py
├── test_imports.py
└── test_config_e2e.py
Critical File: manylatents/__init__.py
This file MUST contain the namespace package declaration:
__path__ = __import__('pkgutil').extend_path(__path__, __name__)
Without this, Python won't merge your package with core manyLatents.
Hydra Config Integration
SearchPathPlugin
Create manylatents/yourext_plugin.py:
from hydra.core.config_search_path import ConfigSearchPath
from hydra.plugins.search_path_plugin import SearchPathPlugin
class YourExtSearchPathPlugin(SearchPathPlugin):
def manipulate_search_path(self, search_path: ConfigSearchPath) -> None:
search_path.append(
provider="manylatents",
path="pkg://manylatents.configs",
)
# Higher priority for YOUR configs
search_path.prepend(
provider="manylatents-yourext",
path="pkg://manylatents.yourext.configs",
)
Use prepend() if your configs should override core configs with the same name, append() if core should take precedence.
Entry Point Registration (Recommended)
Core auto-discovers extensions via the manylatents.extensions entry-point group.
Add this to your pyproject.toml:
[project.entry-points."manylatents.extensions"]
yourext = "manylatents.yourext_plugin:YourExtSearchPathPlugin"
When your extension is installed (uv pip install -e ., uv add, or pip install),
core's manylatents.main will automatically discover and register your plugin at startup.
No custom entry point needed.
pyproject.toml
[project]
name = "manylatents-yourext"
version = "0.1.0"
requires-python = ">=3.10, <3.13"
dependencies = [
"manylatents",
]
[project.entry-points."manylatents.extensions"]
yourext = "manylatents.yourext_plugin:YourExtSearchPathPlugin"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.hatch.build.targets.wheel]
packages = ["manylatents"] # CRITICAL: Package the manylatents/ directory
[tool.uv]
managed = true
[tool.uv.sources]
manylatents = { git = "https://github.com/latent-reasoning-works/manylatents.git" }
Component Types
Custom LatentModule (Algorithm)
from torch import Tensor
from manylatents.algorithms.latent.latent_module_base import LatentModule
class YourAlgorithm(LatentModule):
def __init__(self, n_components=2, your_param=1.0, **kwargs):
super().__init__(n_components=n_components, **kwargs)
self.your_param = your_param
def fit(self, x: Tensor) -> None:
x_np = x.detach().cpu().numpy()
# Your fitting logic
self._is_fitted = True
def transform(self, x: Tensor) -> Tensor:
if not self._is_fitted:
raise RuntimeError("Model not fitted. Call fit() first.")
x_np = x.detach().cpu().numpy()
embedding = ... # Your embedding computation
return torch.tensor(embedding, device=x.device, dtype=x.dtype)
Config: manylatents/yourext/configs/algorithms/latent/your_algo.yaml
_target_: manylatents.yourext.algorithms.YourAlgorithm
n_components: 2
your_param: 1.0
Custom Dataset
import numpy as np
class YourDataset:
def __init__(self, data_path: str, n_samples=None):
self.data_path = data_path
self._data = np.load(data_path)
if n_samples:
self._data = self._data[:n_samples]
def get_data(self) -> np.ndarray:
return self._data
@property
def data(self) -> np.ndarray:
return self._data
Config: manylatents/yourext/configs/data/your_data.yaml
_target_: manylatents.yourext.data.YourDataset
data_path: ${paths.data_dir}/your_data.npy
n_samples: null
Custom Metric
import numpy as np
from typing import Optional
from manylatents.algorithms.latent.latent_module_base import LatentModule
def YourMetric(
embeddings: np.ndarray,
dataset: object,
module: Optional[LatentModule] = None,
threshold: float = 0.5,
return_per_sample: bool = False,
) -> float:
scores = ... # Your metric computation
if return_per_sample:
return float(np.mean(scores)), scores
return float(np.mean(scores))
Config: manylatents/yourext/configs/metrics/dataset/your_metric.yaml
_target_: manylatents.yourext.metrics.YourMetric
_partial_: true # CRITICAL: deferred parameter binding
threshold: 0.5
return_per_sample: false
Experiment Config
manylatents/yourext/configs/experiment/your_experiment.yaml:
# @package _global_
name: your_experiment
project: your_project
defaults:
- override /algorithms/latent: your_algo
- override /data: your_data
- override /callbacks/embedding: default
- override /metrics: default
seed: 42
CI Requirements
Import Tests
def test_namespace_package():
import manylatents.yourext
from manylatents.yourext.algorithms import YourAlgorithm
from manylatents.data import SwissRoll # Core still works
assert YourAlgorithm is not None
def test_algorithm_interface():
from manylatents.yourext.algorithms import YourAlgorithm
import torch
algo = YourAlgorithm(n_components=2)
X = torch.randn(100, 50)
embedding = algo.fit_transform(X)
assert embedding.shape == (100, 2)
Config Resolution Tests
from omegaconf import OmegaConf
from pathlib import Path
CONFIGS_DIR = Path(__file__).parent.parent / "manylatents" / "yourext" / "configs"
def test_all_targets_importable():
for config_file in CONFIGS_DIR.rglob("*.yaml"):
cfg = OmegaConf.load(config_file)
if hasattr(cfg, "_target_"):
assert cfg._target_.startswith("manylatents")
GitHub Actions Workflow
name: CI
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10", "3.11", "3.12"]
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v5
- run: uv sync
- run: uv run pytest tests/ -v
- run: |
uv run python -c "
import manylatents
assert len(manylatents.__path__) >= 2
print('Namespace package OK')
"
Testing Checklist
Namespace Package
-
manylatents/__init__.pyhasextend_pathline -
import manylatents.yourextworks - Core manylatents still importable
Hydra Config
- SearchPathPlugin registered via entry-point in
pyproject.toml - Verify auto-discovery:
python -c "from manylatents.extensions import discover_extensions; discover_extensions()" - All configs have valid
_target_paths - Metrics use
_partial_: true - Experiment configs use
# @package _global_
Interface Compliance
- LatentModule subclasses implement
fit()andtransform() - Datasets have
get_data()method - Metrics accept
(embeddings, dataset, module, **kwargs)
CI
- Import tests pass on Python 3.10-3.12
- Config resolution tests pass
- Core tests still pass with extension installed
Quick Reference
# Core entry point with auto-discovery
uv run python -m manylatents.main experiment=your_experiment
Troubleshooting Auto-Discovery
Check installed entry-points:
from importlib.metadata import entry_points
eps = entry_points(group="manylatents.extensions")
print([(ep.name, ep.value) for ep in eps])
Verify plugin registration:
from manylatents.extensions import discover_extensions
discover_extensions()
from hydra.core.plugins import Plugins
from hydra.plugins.search_path_plugin import SearchPathPlugin
print([p.__name__ for p in Plugins.instance().discover(SearchPathPlugin)])
Extension not showing up?
- Ensure the package is installed (not just cloned):
uv pip install -e . - Check
pyproject.tomlhas[project.entry-points."manylatents.extensions"] - Reinstall after editing
pyproject.toml— entry-points come from package metadata