Metrics

The evaluation system for manyLatents: metrics for measuring embedding quality, dataset properties, and algorithm internals. All metric configs live in a flat configs/metrics/ directory. Each config declares its evaluation target via the at field.

Pipeline Execution Model

Metrics and sampling operate on named pipeline outputs — a dict built as run_experiment() progresses. Understanding when each output becomes available is key to understanding what at and sampling can target.

run_experiment()
│
├─ datamodule.setup()
│   outputs["dataset"] = ds.data                          ← dataset available
│
├─ [sampling.dataset] ── subsample input before fit       ← POSITION 1
│
├─ algorithm.fit(train_tensor)
├─ algorithm.transform(test_tensor) → embeddings
│   outputs["embedding"] = embeddings                     ← embedding available
│   outputs["module"]    = algorithm                      ← module available
│   outputs["affinity"]  = algorithm.extra_outputs()      ← extras available
│   outputs["kernel"]    = ...                              (algorithm-dependent)
│   outputs["adjacency"] = ...
│
├─ evaluate()   [evaluate.py]
│   ├─ [sampling.embedding] ── subsample before metrics   ← POSITION 2
│   ├─ prewarm_cache() ── kNN/eigenvalues per "on" value
│   └─ for each metric:
│       ├─ read `at` field → resolve from outputs dict
│       └─ metric_fn(embeddings=..., dataset=..., module=..., cache=...)
│
└─ callbacks (receive full unsampled data + scores)

Output availability

Output	Available after	Source	Always present
`dataset`	`datamodule.setup()`	`ds.data`	Yes
`embedding`	`algorithm.transform()`	Embedding array	Yes
`module`	`algorithm.fit()`	Fitted LatentModule	Yes (for LatentModules)
`affinity`	`algorithm.fit()`	`module.extra_outputs()`	No — algorithm-dependent
`kernel`	`algorithm.fit()`	`module.extra_outputs()`	No — algorithm-dependent
`adjacency`	`algorithm.fit()`	`module.extra_outputs()`	No — algorithm-dependent

New outputs can be added by having a LatentModule return them from extra_outputs(). Metrics can immediately target them via at: "<key>" — no code changes needed in the evaluation pipeline.

Sampling positions

Sampling has two categories with different infrastructure:

Pre-fit (sampling.dataset): Fixed integration point in run_experiment() BEFORE fit(). Reduces what the algorithm sees. This is inherently positional — it changes the algorithm's input, not just what metrics evaluate on.
Post-fit (any other key): Dynamic loop in evaluate() over the outputs dict. Any array-valued output can be sampled. If sampling.embedding is configured, the dataset is auto-sliced to matching indices for cross-space metrics.

Post-fit sampling uses the same dynamic resolution as metric routing — it iterates the sampling config, matches keys against the outputs dict, and applies the sampler to any matching array. New outputs from extra_outputs() are automatically sampleable.

The get_indices() method on samplers accepts **kwargs for future extensibility — complex samplers (e.g., diffusion condensation) may need access to the kNN cache, outputs dict, or fitted module to build their sampling operator.

Metric Selection

Select metrics on the CLI with metrics=<name>:

# Single metric
manylatents algorithms/latent=pca data=swissroll metrics=trustworthiness

# Bundle (composes multiple metrics)
manylatents algorithms/latent=pca data=swissroll metrics=standard

Embedding Metrics

Evaluate the quality of low-dimensional embeddings. Compare high-dimensional input to low-dimensional output. Config: at: embedding.

metric	config	defaults	description
AlignmentScore	`metrics=alignment_score`	k=20, method=jaccard	Compute composite per-variant alignment score.
Anisotropy	`metrics=anisotropy`	--	Anisotropy of embedding space
AUC	`metrics=auc`	--	Area Under ROC Curve for binary classification
CKA	`metrics=cka`	kernel=linear	Centered Kernel Alignment with linear kernel
Continuity	`metrics=continuity`	return_per_sample=True	Continuity of embedding (preservation of original neighborhoods)
CrossModalJaccard	`metrics=cross_modal_jaccard`	k=20, metric=euclidean	Cross-modal k-NN neighborhood Jaccard overlap
DiffusionCondensation	`metrics=diffusion_condensation`	scale=1.025, granularity=0.1, knn=5, decay=40, n_pca=50, output_mode=stable	Diffusion condensation score
DiffusionCurvature	`metrics=diffusion_curvature`	t=3, percentile=5	Diffusion curvature of embedding manifold
DiffusionSpectralEntropy	`metrics=diffusion_spectral_entropy`	t=1, gaussian_kernel_sigma=10	Diffusion spectral entropy (eigenvalue count at diffusion time t)
DiffusionSpectralEntropy	`metrics=dse_dense`	output_mode=eigenvalue_count, t_high=[10, 50, 100, 200, 500], numerical_floor=1e-06, kernel=dense, gaussian_kernel_sigma=10	Diffusion spectral entropy (eigenvalue count at diffusion time t)
DiffusionSpectralEntropy	`metrics=dse_knn`	output_mode=eigenvalue_count, t_high=[10, 50, 100, 200, 500], numerical_floor=1e-06, kernel=knn, k=${neighborhood_size}, alpha=1.0	Diffusion spectral entropy (eigenvalue count at diffusion time t)
DiffusionSpectralEntropy	`metrics=dse_t_sweep`	output_mode=eigenvalue_count_sweep, t_high=[10, 50, 100, 200, 500], numerical_floor=1e-06, gaussian_kernel_sigma=10	Diffusion spectral entropy (eigenvalue count at diffusion time t)
LocalSpectralAnalysis	`metrics=eigenvalue_effective_rank`	n_neighbors=20, output_mode=full	Local spectral analysis of kNN neighborhoods.
FractalDimension	`metrics=fractal_dimension`	n_box_sizes=10	Correlation fractal dimension of embedding
KNNPreservation	`metrics=knn_preservation`	n_neighbors=10, metric=euclidean	k-nearest neighbor preservation between original and embedded spaces
LocalIntrinsicDimensionality	`metrics=local_intrinsic_dimensionality`	k=20	Mean local intrinsic dimensionality of the embedding
LogLogConsistency	`metrics=loglog_consistency`	k=200, k_min=5, k_steps=20	Per-point log-log power law consistency diagnostic.
MagnitudeDimension	`metrics=magnitude_dimension`	n_ts=50, log_scale=False, scale_finding=convergence, target_prop=0.95, metric=euclidean, p=2, n_neighbors=12, method=cholesky, one_point_property=True, perturb_singularities=True, positive_magnitude=False, exact=False	Magnitude-based effective dimensionality
MismatchRatio	`metrics=mismatch_ratio`	k=200, k_min=5, k_steps=20, r2_threshold=0.95	Per-point mismatch ratio v = k_eff / k_star
NoOp	`metrics=noop`	--
OutlierScore	`metrics=outlier_score`	k=20, return_scores=False	Outlier scores using Local Outlier Factor
LocalSpectralAnalysis	`metrics=participation_ratio`	n_neighbors=25, return_per_sample=True	Local spectral analysis of kNN neighborhoods.
PearsonCorrelation	`metrics=pearson_correlation`	return_per_sample=False, num_dists=100	Pearson correlation between pairwise distances
PersistentHomology	`metrics=persistent_homology`	homology_dim=1, persistence_threshold=0.1, max_N=2000, random_seed=0	Count of loops/cycles (H1 Betti number)
PersistentHomology	`metrics=persistent_homology_beta0`	homology_dim=0, persistence_threshold=3.0	Count of loops/cycles (H1 Betti number)
RankAgreement	`metrics=rank_agreement`	k=20, metric_fn=lid	Rank-based agreement of LID/PR across modalities
ShepardResidual	`metrics=shepard_residual`	k=15	Per-point Shepard residual on local-rank neighborhoods
SilhouetteScore	`metrics=silhouette`	metric=euclidean	Silhouette score for cluster separation in embedding
TangentSpaceApproximation	`metrics=tangent_space`	n_neighbors=25, variance_threshold=0.95, return_per_sample=True	Tangent space alignment between original and embedded spaces
Trustworthiness	`metrics=trustworthiness`	n_neighbors=5, metric=euclidean	Compute the trustworthiness of an embedding.
Trustworthiness	`metrics=trustworthiness_k`	n_neighbors=[15, 25, 50, 100, 250], metric=euclidean	Compute the trustworthiness of an embedding.

Module Metrics

Evaluate algorithm-specific internal components. Require a fitted module exposing affinity() or kernel(). Config: at: module.

metric	config	defaults	description
AffinitySpectrum	`metrics=affinity_spectrum`	--	Top-k eigenvalues of the affinity matrix
ConnectedComponents	`metrics=connected_components`	--	Number of connected components in the kNN graph
DatasetTopologyDescriptor	`metrics=dataset_topology_descriptor`	--	Topological descriptor of the dataset structure
DiffusionMapCorrelation	`metrics=diffusion_map_correlation`	dm_components=2, alpha=1.0, correlation_type=pearson	Correlation between diffusion map and embedding distances
EffectiveNeighborhoodSize	`metrics=effective_neighborhood_size`	--	Per-point effective neighborhood size from method's internal graph weights
KernelMatrixDensity	`metrics=kernel_matrix_density`	threshold=1e-10	Density of the kernel/affinity matrix
KernelMatrixSparsity	`metrics=kernel_matrix_sparsity`	threshold=1e-10	Sparsity of the kernel/affinity matrix
SpectralDecayRate	`metrics=spectral_decay_rate`	top_k=20	Fit exponential decay to the eigenvalue spectrum.
SpectralGapRatio	`metrics=spectral_gap_ratio`	--	Ratio of first to second eigenvalue of the diffusion operator

Dataset Metrics

Evaluate properties of the original high-dimensional data, independent of the DR algorithm. Config: at: dataset.

metric	config	defaults	description
GroundTruthPreservation	`metrics=admixture_laplacian`	--	Admixture Laplacian preservation score
GeodesicDistanceCorrelation	`metrics=geodesic_distance_correlation`	correlation_type=spearman	Correlation between geodesic and embedded distances
kmeans_stratification	`metrics=stratification`	random_state=${seed}	K-means stratification score for population structure

ProtocolWriting a New MetricRunning Without Metrics

Metric Protocol

All metrics must match the Metric protocol (manylatents/metrics/metric.py):

def __call__(
    self,
    embeddings: np.ndarray,
    dataset=None,
    module=None,
    cache=None,
) -> float | tuple[float, np.ndarray] | dict[str, Any]

Return Types

Type	Use Case	Example
`float`	Simple scalar	Trustworthiness: `0.95`
`tuple[float, ndarray]`	Scalar + per-sample	Continuity with `return_per_sample=True`
`dict[str, Any]`	Structured output	Persistent homology: `{'beta_0': ..., 'beta_1': ...}`

Configuration

Metrics use Hydra's _partial_: True for deferred parameter binding:

# configs/metrics/trustworthiness.yaml
trustworthiness:
  _target_: manylatents.metrics.trustworthiness.Trustworthiness
  _partial_: true
  n_neighbors: 5
  at: embedding

Multi-Scale Expansion

List-valued parameters expand via Cartesian product through flatten_and_unroll_metrics():

n_neighbors: [5, 10, 20]  # Produces 3 separate evaluations

Naming convention: trustworthiness__n_neighbors_5, trustworthiness__n_neighbors_10, etc.

Shared kNN Cache

Metrics that need kNN graphs share a cache computed once with max(k) across all metrics, avoiding redundant computation.

Writing a New Metric

import numpy as np
from typing import Optional

def YourMetric(
    embeddings: np.ndarray,
    dataset=None,
    module=None,
    k: int = 10,
    cache=None,
) -> float:
    # Your computation
    return score

Choosing the Right Context

Set the at field in your config to target a pipeline output (see Pipeline Execution Model above):

Only needs original data? → at: dataset
Compares original vs. reduced? → at: embedding
Needs algorithm internals (affinity, spectral properties)? → at: module
Needs a specific matrix? → at: affinity / at: kernel / at: adjacency (algorithm must produce it)

Config

# configs/metrics/your_metric.yaml
your_metric:
  _target_: manylatents.metrics.your_metric.YourMetric
  _partial_: true
  k: 10
  at: embedding

Testing

Use metrics=noop to verify integration:

uv run python -m manylatents.main data=swissroll algorithms/latent=pca metrics=noop

Null Metrics Support

manyLatents supports running experiments without metrics computation — useful for fast debugging, exploratory analysis, or workflows where metrics are computed separately.

Usage

CLI (Default)

Metrics are null by default. Just don't specify them:

# No metrics (default)
uv run python -m manylatents.main data=swissroll algorithms/latent=pca

# With metrics (explicit opt-in)
uv run python -m manylatents.main data=swissroll algorithms/latent=pca metrics=noop

Experiment Configs

# configs/experiment/my_experiment.yaml
# @package _global_
defaults:
  - override /algorithms/latent: pca
  - override /data: swissroll
  - override /callbacks/embedding: default
  # No metrics override - stays null

Python API

from manylatents.api import run

result = run(
    data="swissroll",
    algorithms={'latent': 'pca'},
    metrics=None  # Explicitly disable
)

Expected Behavior

When metrics=null:

Generates embeddings
Saves embeddings to files
Creates plots (if callbacks configured)
Logs to wandb (if configured)
Does NOT compute evaluation metrics
Shows warning: "No scores found"

Design: Opt-In by Default

The base config (configs/config.yaml) sets metrics to null. Experiment configs opt in:

# configs/experiment/single_algorithm.yaml
defaults:
  - override /metrics: noop  # Opt in for this experiment

Hydra Limitation

Hydra CLI does not support null as an override value. You cannot do metrics=null on the command line — Hydra's parser converts "null" to Python None, which its override validator rejects.

Workarounds:

Use experiment configs without metrics specified
Use the Python API with metrics=None (our code handles this)
Use metrics=null config files (e.g., the base config already does this)

The API intercepts None values before Hydra sees them and sets them after config composition via OmegaConf.update().

Troubleshooting

"Could not find 'metrics/none'"

You're trying metrics=none as a CLI override. Hydra interprets this as looking for metrics/none.yaml.

Fix: Use an experiment config, or the API with metrics=None.

Metrics Still Being Computed

Check that:

Your experiment config doesn't have - override /metrics: ... in defaults
You're not passing metrics=... on the command line
The final config shows metrics: null