Metrics
The evaluation system for manyLatents: metrics for measuring embedding quality, dataset properties, and algorithm internals. All metric configs live in a flat configs/metrics/ directory. Each config declares its evaluation target via the at field.
Pipeline Execution Model
Metrics and sampling operate on named pipeline outputs — a dict built as run_experiment() progresses. Understanding when each output becomes available is key to understanding what at and sampling can target.
run_experiment()
│
├─ datamodule.setup()
│ outputs["dataset"] = ds.data ← dataset available
│
├─ [sampling.dataset] ── subsample input before fit ← POSITION 1
│
├─ algorithm.fit(train_tensor)
├─ algorithm.transform(test_tensor) → embeddings
│ outputs["embedding"] = embeddings ← embedding available
│ outputs["module"] = algorithm ← module available
│ outputs["affinity"] = algorithm.extra_outputs() ← extras available
│ outputs["kernel"] = ... (algorithm-dependent)
│ outputs["adjacency"] = ...
│
├─ evaluate() [evaluate.py]
│ ├─ [sampling.embedding] ── subsample before metrics ← POSITION 2
│ ├─ prewarm_cache() ── kNN/eigenvalues per "on" value
│ └─ for each metric:
│ ├─ read `at` field → resolve from outputs dict
│ └─ metric_fn(embeddings=..., dataset=..., module=..., cache=...)
│
└─ callbacks (receive full unsampled data + scores)
Output availability
| Output | Available after | Source | Always present |
|---|---|---|---|
dataset |
datamodule.setup() |
ds.data |
Yes |
embedding |
algorithm.transform() |
Embedding array | Yes |
module |
algorithm.fit() |
Fitted LatentModule | Yes (for LatentModules) |
affinity |
algorithm.fit() |
module.extra_outputs() |
No — algorithm-dependent |
kernel |
algorithm.fit() |
module.extra_outputs() |
No — algorithm-dependent |
adjacency |
algorithm.fit() |
module.extra_outputs() |
No — algorithm-dependent |
New outputs can be added by having a LatentModule return them from extra_outputs(). Metrics can immediately target them via at: "<key>" — no code changes needed in the evaluation pipeline.
Sampling positions
Sampling has two categories with different infrastructure:
- Pre-fit (
sampling.dataset): Fixed integration point inrun_experiment()BEFOREfit(). Reduces what the algorithm sees. This is inherently positional — it changes the algorithm's input, not just what metrics evaluate on. - Post-fit (any other key): Dynamic loop in
evaluate()over theoutputsdict. Any array-valued output can be sampled. Ifsampling.embeddingis configured, the dataset is auto-sliced to matching indices for cross-space metrics.
Post-fit sampling uses the same dynamic resolution as metric routing — it iterates the sampling config, matches keys against the outputs dict, and applies the sampler to any matching array. New outputs from extra_outputs() are automatically sampleable.
The get_indices() method on samplers accepts **kwargs for future extensibility — complex samplers (e.g., diffusion condensation) may need access to the kNN cache, outputs dict, or fitted module to build their sampling operator.
Metric Selection
Select metrics on the CLI with metrics=<name>:
# Single metric
manylatents algorithms/latent=pca data=swissroll metrics=trustworthiness
# Bundle (composes multiple metrics)
manylatents algorithms/latent=pca data=swissroll metrics=standard
Embedding Metrics
Evaluate the quality of low-dimensional embeddings. Compare high-dimensional input to low-dimensional output. Config: at: embedding.
| metric | config | defaults | description |
|---|---|---|---|
| AlignmentScore | metrics=alignment_score |
k=20, method=jaccard | Compute composite per-variant alignment score. |
| Anisotropy | metrics=anisotropy |
-- | Anisotropy of embedding space |
| AUC | metrics=auc |
-- | Area Under ROC Curve for binary classification |
| CKA | metrics=cka |
kernel=linear | Centered Kernel Alignment with linear kernel |
| Continuity | metrics=continuity |
return_per_sample=True | Continuity of embedding (preservation of original neighborhoods) |
| CrossModalJaccard | metrics=cross_modal_jaccard |
k=20, metric=euclidean | Cross-modal k-NN neighborhood Jaccard overlap |
| DiffusionCondensation | metrics=diffusion_condensation |
scale=1.025, granularity=0.1, knn=5, decay=40, n_pca=50, output_mode=stable | Diffusion condensation score |
| DiffusionCurvature | metrics=diffusion_curvature |
t=3, percentile=5 | Diffusion curvature of embedding manifold |
| DiffusionSpectralEntropy | metrics=diffusion_spectral_entropy |
t=1, gaussian_kernel_sigma=10 | Diffusion spectral entropy (eigenvalue count at diffusion time t) |
| DiffusionSpectralEntropy | metrics=dse_dense |
output_mode=eigenvalue_count, t_high=[10, 50, 100, 200, 500], numerical_floor=1e-06, kernel=dense, gaussian_kernel_sigma=10 | Diffusion spectral entropy (eigenvalue count at diffusion time t) |
| DiffusionSpectralEntropy | metrics=dse_knn |
output_mode=eigenvalue_count, t_high=[10, 50, 100, 200, 500], numerical_floor=1e-06, kernel=knn, k=${neighborhood_size}, alpha=1.0 | Diffusion spectral entropy (eigenvalue count at diffusion time t) |
| DiffusionSpectralEntropy | metrics=dse_t_sweep |
output_mode=eigenvalue_count_sweep, t_high=[10, 50, 100, 200, 500], numerical_floor=1e-06, gaussian_kernel_sigma=10 | Diffusion spectral entropy (eigenvalue count at diffusion time t) |
| LocalSpectralAnalysis | metrics=eigenvalue_effective_rank |
n_neighbors=20, output_mode=full | Local spectral analysis of kNN neighborhoods. |
| FractalDimension | metrics=fractal_dimension |
n_box_sizes=10 | Correlation fractal dimension of embedding |
| KNNPreservation | metrics=knn_preservation |
n_neighbors=10, metric=euclidean | k-nearest neighbor preservation between original and embedded spaces |
| LocalIntrinsicDimensionality | metrics=local_intrinsic_dimensionality |
k=20 | Mean local intrinsic dimensionality of the embedding |
| LogLogConsistency | metrics=loglog_consistency |
k=200, k_min=5, k_steps=20 | Per-point log-log power law consistency diagnostic. |
| MagnitudeDimension | metrics=magnitude_dimension |
n_ts=50, log_scale=False, scale_finding=convergence, target_prop=0.95, metric=euclidean, p=2, n_neighbors=12, method=cholesky, one_point_property=True, perturb_singularities=True, positive_magnitude=False, exact=False | Magnitude-based effective dimensionality |
| MismatchRatio | metrics=mismatch_ratio |
k=200, k_min=5, k_steps=20, r2_threshold=0.95 | Per-point mismatch ratio v = k_eff / k_star |
| NoOp | metrics=noop |
-- | |
| OutlierScore | metrics=outlier_score |
k=20, return_scores=False | Outlier scores using Local Outlier Factor |
| LocalSpectralAnalysis | metrics=participation_ratio |
n_neighbors=25, return_per_sample=True | Local spectral analysis of kNN neighborhoods. |
| PearsonCorrelation | metrics=pearson_correlation |
return_per_sample=False, num_dists=100 | Pearson correlation between pairwise distances |
| PersistentHomology | metrics=persistent_homology |
homology_dim=1, persistence_threshold=0.1, max_N=2000, random_seed=0 | Count of loops/cycles (H1 Betti number) |
| PersistentHomology | metrics=persistent_homology_beta0 |
homology_dim=0, persistence_threshold=3.0 | Count of loops/cycles (H1 Betti number) |
| RankAgreement | metrics=rank_agreement |
k=20, metric_fn=lid | Rank-based agreement of LID/PR across modalities |
| ShepardResidual | metrics=shepard_residual |
k=15 | Per-point Shepard residual on local-rank neighborhoods |
| SilhouetteScore | metrics=silhouette |
metric=euclidean | Silhouette score for cluster separation in embedding |
| TangentSpaceApproximation | metrics=tangent_space |
n_neighbors=25, variance_threshold=0.95, return_per_sample=True | Tangent space alignment between original and embedded spaces |
| Trustworthiness | metrics=trustworthiness |
n_neighbors=5, metric=euclidean | Compute the trustworthiness of an embedding. |
| Trustworthiness | metrics=trustworthiness_k |
n_neighbors=[15, 25, 50, 100, 250], metric=euclidean | Compute the trustworthiness of an embedding. |
Module Metrics
Evaluate algorithm-specific internal components. Require a fitted module exposing affinity() or kernel(). Config: at: module.
| metric | config | defaults | description |
|---|---|---|---|
| AffinitySpectrum | metrics=affinity_spectrum |
-- | Top-k eigenvalues of the affinity matrix |
| ConnectedComponents | metrics=connected_components |
-- | Number of connected components in the kNN graph |
| DatasetTopologyDescriptor | metrics=dataset_topology_descriptor |
-- | Topological descriptor of the dataset structure |
| DiffusionMapCorrelation | metrics=diffusion_map_correlation |
dm_components=2, alpha=1.0, correlation_type=pearson | Correlation between diffusion map and embedding distances |
| EffectiveNeighborhoodSize | metrics=effective_neighborhood_size |
-- | Per-point effective neighborhood size from method's internal graph weights |
| KernelMatrixDensity | metrics=kernel_matrix_density |
threshold=1e-10 | Density of the kernel/affinity matrix |
| KernelMatrixSparsity | metrics=kernel_matrix_sparsity |
threshold=1e-10 | Sparsity of the kernel/affinity matrix |
| SpectralDecayRate | metrics=spectral_decay_rate |
top_k=20 | Fit exponential decay to the eigenvalue spectrum. |
| SpectralGapRatio | metrics=spectral_gap_ratio |
-- | Ratio of first to second eigenvalue of the diffusion operator |
Dataset Metrics
Evaluate properties of the original high-dimensional data, independent of the DR algorithm. Config: at: dataset.
| metric | config | defaults | description |
|---|---|---|---|
| GroundTruthPreservation | metrics=admixture_laplacian |
-- | Admixture Laplacian preservation score |
| GeodesicDistanceCorrelation | metrics=geodesic_distance_correlation |
correlation_type=spearman | Correlation between geodesic and embedded distances |
| kmeans_stratification | metrics=stratification |
random_state=${seed} | K-means stratification score for population structure |
Metric Protocol
All metrics must match the Metric protocol (manylatents/metrics/metric.py):
def __call__(
self,
embeddings: np.ndarray,
dataset=None,
module=None,
cache=None,
) -> float | tuple[float, np.ndarray] | dict[str, Any]
Return Types
| Type | Use Case | Example |
|---|---|---|
float |
Simple scalar | Trustworthiness: 0.95 |
tuple[float, ndarray] |
Scalar + per-sample | Continuity with return_per_sample=True |
dict[str, Any] |
Structured output | Persistent homology: {'beta_0': ..., 'beta_1': ...} |
Configuration
Metrics use Hydra's _partial_: True for deferred parameter binding:
# configs/metrics/trustworthiness.yaml
trustworthiness:
_target_: manylatents.metrics.trustworthiness.Trustworthiness
_partial_: true
n_neighbors: 5
at: embedding
Multi-Scale Expansion
List-valued parameters expand via Cartesian product through flatten_and_unroll_metrics():
n_neighbors: [5, 10, 20] # Produces 3 separate evaluations
Naming convention: trustworthiness__n_neighbors_5, trustworthiness__n_neighbors_10, etc.
Shared kNN Cache
Metrics that need kNN graphs share a cache computed once with max(k) across all metrics, avoiding redundant computation.
Writing a New Metric
import numpy as np
from typing import Optional
def YourMetric(
embeddings: np.ndarray,
dataset=None,
module=None,
k: int = 10,
cache=None,
) -> float:
# Your computation
return score
Choosing the Right Context
Set the at field in your config to target a pipeline output (see Pipeline Execution Model above):
- Only needs original data? →
at: dataset - Compares original vs. reduced? →
at: embedding - Needs algorithm internals (affinity, spectral properties)? →
at: module - Needs a specific matrix? →
at: affinity/at: kernel/at: adjacency(algorithm must produce it)
Config
# configs/metrics/your_metric.yaml
your_metric:
_target_: manylatents.metrics.your_metric.YourMetric
_partial_: true
k: 10
at: embedding
Testing
Use metrics=noop to verify integration:
uv run python -m manylatents.main data=swissroll algorithms/latent=pca metrics=noop
Null Metrics Support
manyLatents supports running experiments without metrics computation — useful for fast debugging, exploratory analysis, or workflows where metrics are computed separately.
Usage
CLI (Default)
Metrics are null by default. Just don't specify them:
# No metrics (default)
uv run python -m manylatents.main data=swissroll algorithms/latent=pca
# With metrics (explicit opt-in)
uv run python -m manylatents.main data=swissroll algorithms/latent=pca metrics=noop
Experiment Configs
# configs/experiment/my_experiment.yaml
# @package _global_
defaults:
- override /algorithms/latent: pca
- override /data: swissroll
- override /callbacks/embedding: default
# No metrics override - stays null
Python API
from manylatents.api import run
result = run(
data="swissroll",
algorithms={'latent': 'pca'},
metrics=None # Explicitly disable
)
Expected Behavior
When metrics=null:
- Generates embeddings
- Saves embeddings to files
- Creates plots (if callbacks configured)
- Logs to wandb (if configured)
- Does NOT compute evaluation metrics
- Shows warning: "No scores found"
Design: Opt-In by Default
The base config (configs/config.yaml) sets metrics to null. Experiment configs opt in:
# configs/experiment/single_algorithm.yaml
defaults:
- override /metrics: noop # Opt in for this experiment
Hydra Limitation
Hydra CLI does not support null as an override value. You cannot do metrics=null on the command line — Hydra's parser converts "null" to Python None, which its override validator rejects.
Workarounds:
- Use experiment configs without metrics specified
- Use the Python API with
metrics=None(our code handles this) - Use
metrics=nullconfig files (e.g., the base config already does this)
The API intercepts None values before Hydra sees them and sets them after config composition via OmegaConf.update().
Troubleshooting
"Could not find 'metrics/none'"
You're trying metrics=none as a CLI override. Hydra interprets this as looking for metrics/none.yaml.
Fix: Use an experiment config, or the API with metrics=None.
Metrics Still Being Computed
Check that:
- Your experiment config doesn't have
- override /metrics: ...in defaults - You're not passing
metrics=...on the command line - The final config shows
metrics: null