Skip to content

Metrics

The evaluation system for manyLatents: metrics for measuring embedding quality, dataset properties, and algorithm internals. All metric configs live in a flat configs/metrics/ directory. Each config declares its evaluation target via the at field.

Pipeline Execution Model

Metrics and sampling operate on named pipeline outputs — a dict built as run_experiment() progresses. Understanding when each output becomes available is key to understanding what at and sampling can target.

run_experiment()
│
├─ datamodule.setup()
│   outputs["dataset"] = ds.data                          ← dataset available
│
├─ [sampling.dataset] ── subsample input before fit       ← POSITION 1
│
├─ algorithm.fit(train_tensor)
├─ algorithm.transform(test_tensor) → embeddings
│   outputs["embedding"] = embeddings                     ← embedding available
│   outputs["module"]    = algorithm                      ← module available
│   outputs["affinity"]  = algorithm.extra_outputs()      ← extras available
│   outputs["kernel"]    = ...                              (algorithm-dependent)
│   outputs["adjacency"] = ...
│
├─ evaluate()   [evaluate.py]
│   ├─ [sampling.embedding] ── subsample before metrics   ← POSITION 2
│   ├─ prewarm_cache() ── kNN/eigenvalues per "on" value
│   └─ for each metric:
│       ├─ read `at` field → resolve from outputs dict
│       └─ metric_fn(embeddings=..., dataset=..., module=..., cache=...)
│
└─ callbacks (receive full unsampled data + scores)

Output availability

Output Available after Source Always present
dataset datamodule.setup() ds.data Yes
embedding algorithm.transform() Embedding array Yes
module algorithm.fit() Fitted LatentModule Yes (for LatentModules)
affinity algorithm.fit() module.extra_outputs() No — algorithm-dependent
kernel algorithm.fit() module.extra_outputs() No — algorithm-dependent
adjacency algorithm.fit() module.extra_outputs() No — algorithm-dependent

New outputs can be added by having a LatentModule return them from extra_outputs(). Metrics can immediately target them via at: "<key>" — no code changes needed in the evaluation pipeline.

Sampling positions

Sampling has two categories with different infrastructure:

  • Pre-fit (sampling.dataset): Fixed integration point in run_experiment() BEFORE fit(). Reduces what the algorithm sees. This is inherently positional — it changes the algorithm's input, not just what metrics evaluate on.
  • Post-fit (any other key): Dynamic loop in evaluate() over the outputs dict. Any array-valued output can be sampled. If sampling.embedding is configured, the dataset is auto-sliced to matching indices for cross-space metrics.

Post-fit sampling uses the same dynamic resolution as metric routing — it iterates the sampling config, matches keys against the outputs dict, and applies the sampler to any matching array. New outputs from extra_outputs() are automatically sampleable.

The get_indices() method on samplers accepts **kwargs for future extensibility — complex samplers (e.g., diffusion condensation) may need access to the kNN cache, outputs dict, or fitted module to build their sampling operator.

Metric Selection

Select metrics on the CLI with metrics=<name>:

# Single metric
manylatents algorithms/latent=pca data=swissroll metrics=trustworthiness

# Bundle (composes multiple metrics)
manylatents algorithms/latent=pca data=swissroll metrics=standard

Embedding Metrics

Evaluate the quality of low-dimensional embeddings. Compare high-dimensional input to low-dimensional output. Config: at: embedding.

metric config defaults description
AlignmentScore metrics=alignment_score k=20, method=jaccard Compute composite per-variant alignment score.
Anisotropy metrics=anisotropy -- Anisotropy of embedding space
AUC metrics=auc -- Area Under ROC Curve for binary classification
CKA metrics=cka kernel=linear Centered Kernel Alignment with linear kernel
Continuity metrics=continuity return_per_sample=True Continuity of embedding (preservation of original neighborhoods)
CrossModalJaccard metrics=cross_modal_jaccard k=20, metric=euclidean Cross-modal k-NN neighborhood Jaccard overlap
DiffusionCondensation metrics=diffusion_condensation scale=1.025, granularity=0.1, knn=5, decay=40, n_pca=50, output_mode=stable Diffusion condensation score
DiffusionCurvature metrics=diffusion_curvature t=3, percentile=5 Diffusion curvature of embedding manifold
DiffusionSpectralEntropy metrics=diffusion_spectral_entropy t=1, gaussian_kernel_sigma=10 Diffusion spectral entropy (eigenvalue count at diffusion time t)
DiffusionSpectralEntropy metrics=dse_dense output_mode=eigenvalue_count, t_high=[10, 50, 100, 200, 500], numerical_floor=1e-06, kernel=dense, gaussian_kernel_sigma=10 Diffusion spectral entropy (eigenvalue count at diffusion time t)
DiffusionSpectralEntropy metrics=dse_knn output_mode=eigenvalue_count, t_high=[10, 50, 100, 200, 500], numerical_floor=1e-06, kernel=knn, k=${neighborhood_size}, alpha=1.0 Diffusion spectral entropy (eigenvalue count at diffusion time t)
DiffusionSpectralEntropy metrics=dse_t_sweep output_mode=eigenvalue_count_sweep, t_high=[10, 50, 100, 200, 500], numerical_floor=1e-06, gaussian_kernel_sigma=10 Diffusion spectral entropy (eigenvalue count at diffusion time t)
LocalSpectralAnalysis metrics=eigenvalue_effective_rank n_neighbors=20, output_mode=full Local spectral analysis of kNN neighborhoods.
FractalDimension metrics=fractal_dimension n_box_sizes=10 Correlation fractal dimension of embedding
KNNPreservation metrics=knn_preservation n_neighbors=10, metric=euclidean k-nearest neighbor preservation between original and embedded spaces
LocalIntrinsicDimensionality metrics=local_intrinsic_dimensionality k=20 Mean local intrinsic dimensionality of the embedding
LogLogConsistency metrics=loglog_consistency k=200, k_min=5, k_steps=20 Per-point log-log power law consistency diagnostic.
MagnitudeDimension metrics=magnitude_dimension n_ts=50, log_scale=False, scale_finding=convergence, target_prop=0.95, metric=euclidean, p=2, n_neighbors=12, method=cholesky, one_point_property=True, perturb_singularities=True, positive_magnitude=False, exact=False Magnitude-based effective dimensionality
MismatchRatio metrics=mismatch_ratio k=200, k_min=5, k_steps=20, r2_threshold=0.95 Per-point mismatch ratio v = k_eff / k_star
NoOp metrics=noop --
OutlierScore metrics=outlier_score k=20, return_scores=False Outlier scores using Local Outlier Factor
LocalSpectralAnalysis metrics=participation_ratio n_neighbors=25, return_per_sample=True Local spectral analysis of kNN neighborhoods.
PearsonCorrelation metrics=pearson_correlation return_per_sample=False, num_dists=100 Pearson correlation between pairwise distances
PersistentHomology metrics=persistent_homology homology_dim=1, persistence_threshold=0.1, max_N=2000, random_seed=0 Count of loops/cycles (H1 Betti number)
PersistentHomology metrics=persistent_homology_beta0 homology_dim=0, persistence_threshold=3.0 Count of loops/cycles (H1 Betti number)
RankAgreement metrics=rank_agreement k=20, metric_fn=lid Rank-based agreement of LID/PR across modalities
ShepardResidual metrics=shepard_residual k=15 Per-point Shepard residual on local-rank neighborhoods
SilhouetteScore metrics=silhouette metric=euclidean Silhouette score for cluster separation in embedding
TangentSpaceApproximation metrics=tangent_space n_neighbors=25, variance_threshold=0.95, return_per_sample=True Tangent space alignment between original and embedded spaces
Trustworthiness metrics=trustworthiness n_neighbors=5, metric=euclidean Compute the trustworthiness of an embedding.
Trustworthiness metrics=trustworthiness_k n_neighbors=[15, 25, 50, 100, 250], metric=euclidean Compute the trustworthiness of an embedding.

Module Metrics

Evaluate algorithm-specific internal components. Require a fitted module exposing affinity() or kernel(). Config: at: module.

metric config defaults description
AffinitySpectrum metrics=affinity_spectrum -- Top-k eigenvalues of the affinity matrix
ConnectedComponents metrics=connected_components -- Number of connected components in the kNN graph
DatasetTopologyDescriptor metrics=dataset_topology_descriptor -- Topological descriptor of the dataset structure
DiffusionMapCorrelation metrics=diffusion_map_correlation dm_components=2, alpha=1.0, correlation_type=pearson Correlation between diffusion map and embedding distances
EffectiveNeighborhoodSize metrics=effective_neighborhood_size -- Per-point effective neighborhood size from method's internal graph weights
KernelMatrixDensity metrics=kernel_matrix_density threshold=1e-10 Density of the kernel/affinity matrix
KernelMatrixSparsity metrics=kernel_matrix_sparsity threshold=1e-10 Sparsity of the kernel/affinity matrix
SpectralDecayRate metrics=spectral_decay_rate top_k=20 Fit exponential decay to the eigenvalue spectrum.
SpectralGapRatio metrics=spectral_gap_ratio -- Ratio of first to second eigenvalue of the diffusion operator

Dataset Metrics

Evaluate properties of the original high-dimensional data, independent of the DR algorithm. Config: at: dataset.

metric config defaults description
GroundTruthPreservation metrics=admixture_laplacian -- Admixture Laplacian preservation score
GeodesicDistanceCorrelation metrics=geodesic_distance_correlation correlation_type=spearman Correlation between geodesic and embedded distances
kmeans_stratification metrics=stratification random_state=${seed} K-means stratification score for population structure

Metric Protocol

All metrics must match the Metric protocol (manylatents/metrics/metric.py):

def __call__(
    self,
    embeddings: np.ndarray,
    dataset=None,
    module=None,
    cache=None,
) -> float | tuple[float, np.ndarray] | dict[str, Any]

Return Types

Type Use Case Example
float Simple scalar Trustworthiness: 0.95
tuple[float, ndarray] Scalar + per-sample Continuity with return_per_sample=True
dict[str, Any] Structured output Persistent homology: {'beta_0': ..., 'beta_1': ...}

Configuration

Metrics use Hydra's _partial_: True for deferred parameter binding:

# configs/metrics/trustworthiness.yaml
trustworthiness:
  _target_: manylatents.metrics.trustworthiness.Trustworthiness
  _partial_: true
  n_neighbors: 5
  at: embedding

Multi-Scale Expansion

List-valued parameters expand via Cartesian product through flatten_and_unroll_metrics():

n_neighbors: [5, 10, 20]  # Produces 3 separate evaluations

Naming convention: trustworthiness__n_neighbors_5, trustworthiness__n_neighbors_10, etc.

Shared kNN Cache

Metrics that need kNN graphs share a cache computed once with max(k) across all metrics, avoiding redundant computation.

Writing a New Metric

import numpy as np
from typing import Optional

def YourMetric(
    embeddings: np.ndarray,
    dataset=None,
    module=None,
    k: int = 10,
    cache=None,
) -> float:
    # Your computation
    return score

Choosing the Right Context

Set the at field in your config to target a pipeline output (see Pipeline Execution Model above):

  • Only needs original data? → at: dataset
  • Compares original vs. reduced? → at: embedding
  • Needs algorithm internals (affinity, spectral properties)? → at: module
  • Needs a specific matrix? → at: affinity / at: kernel / at: adjacency (algorithm must produce it)

Config

# configs/metrics/your_metric.yaml
your_metric:
  _target_: manylatents.metrics.your_metric.YourMetric
  _partial_: true
  k: 10
  at: embedding

Testing

Use metrics=noop to verify integration:

uv run python -m manylatents.main data=swissroll algorithms/latent=pca metrics=noop

Null Metrics Support

manyLatents supports running experiments without metrics computation — useful for fast debugging, exploratory analysis, or workflows where metrics are computed separately.

Usage

CLI (Default)

Metrics are null by default. Just don't specify them:

# No metrics (default)
uv run python -m manylatents.main data=swissroll algorithms/latent=pca

# With metrics (explicit opt-in)
uv run python -m manylatents.main data=swissroll algorithms/latent=pca metrics=noop

Experiment Configs

# configs/experiment/my_experiment.yaml
# @package _global_
defaults:
  - override /algorithms/latent: pca
  - override /data: swissroll
  - override /callbacks/embedding: default
  # No metrics override - stays null

Python API

from manylatents.api import run

result = run(
    data="swissroll",
    algorithms={'latent': 'pca'},
    metrics=None  # Explicitly disable
)

Expected Behavior

When metrics=null:

  • Generates embeddings
  • Saves embeddings to files
  • Creates plots (if callbacks configured)
  • Logs to wandb (if configured)
  • Does NOT compute evaluation metrics
  • Shows warning: "No scores found"

Design: Opt-In by Default

The base config (configs/config.yaml) sets metrics to null. Experiment configs opt in:

# configs/experiment/single_algorithm.yaml
defaults:
  - override /metrics: noop  # Opt in for this experiment

Hydra Limitation

Hydra CLI does not support null as an override value. You cannot do metrics=null on the command line — Hydra's parser converts "null" to Python None, which its override validator rejects.

Workarounds:

  • Use experiment configs without metrics specified
  • Use the Python API with metrics=None (our code handles this)
  • Use metrics=null config files (e.g., the base config already does this)

The API intercepts None values before Hydra sees them and sets them after config composition via OmegaConf.update().

Troubleshooting

"Could not find 'metrics/none'"

You're trying metrics=none as a CLI override. Hydra interprets this as looking for metrics/none.yaml.

Fix: Use an experiment config, or the API with metrics=None.

Metrics Still Being Computed

Check that:

  1. Your experiment config doesn't have - override /metrics: ... in defaults
  2. You're not passing metrics=... on the command line
  3. The final config shows metrics: null