Skip to content

manylatents-omics

Biological extensions for manylatents adding population genetics, single-cell omics, and foundation model encoders for DNA, RNA, and protein sequences.


Installation

Install the base package:

uv add manylatents-omics

Enable domain-specific extras depending on your use case:

# Population genetics (manifold-genetics CSV pipeline)
uv add "manylatents-omics[popgen]"

# Single-cell omics (AnnData / scanpy)
uv add "manylatents-omics[singlecell]"

# Foundation model encoders (ESM3, Orthrus, Evo2 -- requires GPU)
uv add "manylatents-omics[dogma]"

For foundation model encoders on CUDA, use wheelnext uv to get prebuilt GPU wheels:

curl -LsSf https://astral.sh/uv/install.sh | INSTALLER_DOWNLOAD_URL=https://wheelnext.astral.sh sh
uv sync --extra dogma --index-strategy unsafe-best-match

Quick Start

Omics configs are auto-discovered when the package is installed:

# Single-cell: UMAP on PBMC 3k
python -m manylatents.main data=pbmc_3k algorithms/latent=umap

# Population genetics: HGDP dataset
python -m manylatents.main data=hgdp algorithms/latent=phate

# Foundation model encoding: ClinVar DNA
python -m manylatents.main experiment=clinvar/encode_dna

Note

Omics configs are auto-discovered when manylatents-omics is installed. Just use python -m manylatents.main — omics data configs (data=pbmc_3k, data=hgdp, etc.) will be available automatically.


Modules

manylatents-omics is organized into three domain modules:

Module Domain Data Format Extra
PopGen Population genetics manifold-genetics CSVs [popgen]
Single-Cell Single-cell omics AnnData .h5ad [singlecell]
Dogma DNA / RNA / Protein FASTA sequences [dogma]

PopGen provides the ManifoldGeneticsDataModule for loading PCA, admixture, and geographic data from the manifold-genetics pipeline, along with domain-specific metrics like geographic and admixture preservation.

Single-Cell provides AnnDataModule for loading scRNA-seq, scATAC-seq, and CITE-seq datasets stored in the AnnData .h5ad format. Ships with PBMC 3k, 10k, 68k, and Embryoid Body configs.

Dogma provides pretrained foundation model encoders (ESM3, Evo2, Orthrus, AlphaGenome) that transform biological sequences into dense embeddings, plus the ClinVar pipeline for multi-modal geometric analysis.


Parent Project

manylatents-omics extends the core manylatents library for dimensionality reduction and geometric analysis. Refer to the parent documentation for details on algorithms, metrics, and the experiment framework.