topo.eval.topo_metrics

Functions

_ensure_csr(P)

_top_eigs_of_P(P[, r, which, tol, maxiter, v0, ...])

Compute top-r eigenpairs of P (row-stochastic Markov operator).

diffusion_coordinates(evals, evecs, t[, drop_first, ...])

Phi_t(P) = [lambda_1^t psi_1, ..., lambda_r^t psi_r], optionally skipping the trivial first.

diffusion_distance_from_eigs(evals, evecs, t[, r_use, ...])

Compute diffusion distance matrix (via truncated eigendecomposition).

_upper_triangle_vec(M)

_topk_support_from_row(data, ind, k)

get_P(Y, **kwargs_for_kernel)

Convenience: build a diffusion operator P from data or a precomputed graph.

rank_diffusion_correlation(Px, Py[, times, r, ...])

Global geometry agreement via Spearman correlation of diffusion distances.

multiscale_diffusion_emd(Px, Py[, times, r, bins, ...])

Global geometry preservation via multiscale Earth Mover’s Distance (EMD) on diffusion distances.

spectral_procrustes(Px, Py[, times, r, ...])

Align diffusion coordinates via orthogonal Procrustes and report R^2.

diffusion_rank_biased_overlap(Px, Py[, times, r, p, ...])

Local geometry agreement via Rank-Biased Overlap (RBO) of diffusion neighbors.

rowwise_js_similarity(Px, Py[, eps, topk, return_per_row])

Row-wise Jensen–Shannon (JS) similarity between two diffusion operators.

sparse_neighborhood_f1(Px, Py[, k])

Operator-level set overlap: F1 of top-k transition neighborhoods per row.

spectral_similarity(Px, Py[, r, symmetric_hint, ...])

Operator-level spectral agreement via eigenvalues and subspaces.

commute_time_trace_gap(Px, Py[, r, symmetric_hint, ...])

Global connectivity gap via (approximate) trace of Laplacian pseudoinverse.

topo_preserve_score(Px, Py[, times, r, ...])

Composite TopoPreserve score using four operator-aware metrics

Module Contents

topo.eval.topo_metrics._ensure_csr(P)
topo.eval.topo_metrics._top_eigs_of_P(P, r=64, which='LM', tol=0.0001, maxiter=None, v0=None, symmetric_hint=False)

Compute top-r eigenpairs of P (row-stochastic Markov operator). If you used a symmetric diffusion operator earlier, set symmetric_hint=True for improved stability (we then eigendecompose the symmetrized operator). Returns evals (r,), evecs (n,r). Sorted by |lambda| desc.

topo.eval.topo_metrics.diffusion_coordinates(evals, evecs, t, drop_first=True, r_use=None, normalize_cols=True)

Phi_t(P) = [lambda_1^t psi_1, …, lambda_r^t psi_r], optionally skipping the trivial first.

topo.eval.topo_metrics.diffusion_distance_from_eigs(evals, evecs, t, r_use=None, drop_first=True, squared=False)

Compute diffusion distance matrix (via truncated eigendecomposition). D^2(i,j) = sum_l lambda_l^{2t} (psi_l(i)-psi_l(j))^2 Returns a dense (n,n) matrix for convenience; for large n use sampling.

topo.eval.topo_metrics._upper_triangle_vec(M)
topo.eval.topo_metrics._topk_support_from_row(data, ind, k)
topo.eval.topo_metrics.get_P(Y, **kwargs_for_kernel)

Convenience: build a diffusion operator P from data or a precomputed graph.

Parameters:
  • Y (array-like or sparse matrix or topo.tpgraph.kernels.Kernel) –

    • If a Kernel instance: returns its .P (computing it if needed).

    • If a rectangular (n x d) array/matrix: treated as data; a kernel and diffusion operator are built using the provided kwargs.

    • If a square (n x n) array/sparse matrix AND metric=’precomputed’ is provided (or auto-detected), Y is treated as a precomputed affinity/ kernel matrix (NOT distances), and P is computed from it.

  • **kwargs_for_kernel

    Passed to Kernel(…). Useful options include:

    metric=’cosine’ | ‘euclidean’ | ‘precomputed’ n_neighbors=30 adaptive_bw=True backend=’nmslib’ | ‘hnswlib’ n_jobs=-1 symmetrize=True anisotropy=1.0 use_angular=True (for cosine)

Returns:

P (scipy.sparse.csr_matrix) – The (symmetrized) diffusion operator.

Notes

  • If Y is square and you did NOT set metric=’precomputed’, we auto-switch to ‘precomputed’ (assuming Y is an affinity/kernel). If Y is a distance matrix, convert it to an affinity first or pass the raw data.

topo.eval.topo_metrics.rank_diffusion_correlation(Px, Py, times=(1, 2, 4, 8), r=64, symmetric_hint=False)

Global geometry agreement via Spearman correlation of diffusion distances.

Parameters:
  • Px ((n, n) csr_matrix or ndarray) – Diffusion operators to compare.

  • Py ((n, n) csr_matrix or ndarray) – Diffusion operators to compare.

  • times (tuple of int, default=(1,2,4,8)) – Multiscale t values. The score is averaged over t.

  • r (int, default=64) – Eigenpairs used for diffusion distances.

  • symmetric_hint (bool, default=False) – See diffusion_eigs.

Returns:

rho_avg (float in [-1, 1]) – Average Spearman correlation between the upper triangles of D_t(Px) and D_t(Py) across t (higher is better).

Notes

  • Robust to monotone rescalings (rank-based).

  • Sensitive to global geometry preservation (coarse-to-fine as t grows).

topo.eval.topo_metrics.multiscale_diffusion_emd(Px, Py, times=(1, 2, 4, 8), r=64, bins=64, symmetric_hint=False)

Global geometry preservation via multiscale Earth Mover’s Distance (EMD) on diffusion distances.

Parameters:
  • Px ((n, n) csr_matrix or ndarray) – Diffusion operators to compare.

  • Py ((n, n) csr_matrix or ndarray) – Diffusion operators to compare.

  • times (tuple of int, default=(1,2,4,8)) – Diffusion timescales at which pairwise diffusion distances are computed. Scores are averaged across timescales.

  • r (int, default=64) – Number of leading eigenpairs to use for diffusion distances.

  • bins (int, default=64) – Number of histogram bins used to approximate the distribution of pairwise diffusion distances.

  • symmetric_hint (bool, default=False) – Passed to _top_eigs_of_P; set to True if Px and Py are symmetric operators.

Returns:

emd (float ≥ 0) – Mean Earth Mover’s Distance (1-Wasserstein distance) between the distributions of pairwise diffusion distances in Px and Py, averaged across timescales. Lower is better (0 indicates identical distributions).

Notes

  • This is a distributional comparison: rather than matching each pairwise distance, it checks whether the global distribution of diffusion distances is preserved between Px and Py.

  • The measure is scale-sensitive: if distances in Py are systematically compressed or stretched relative to Px, the EMD will increase.

  • Bins are shared between Px and Py at each timescale to ensure fair histogram comparison.

  • Useful as a complement to correlation-based metrics (e.g. RDC), since it detects distributional distortions even if ranks are preserved.

topo.eval.topo_metrics.spectral_procrustes(Px, Py, times=(1, 2, 4, 8), r=64, symmetric_hint=False, center=True)

Align diffusion coordinates via orthogonal Procrustes and report R^2.

Parameters:
  • Px ((n, n) csr_matrix or ndarray) – Diffusion operators to compare.

  • Py ((n, n) csr_matrix or ndarray) – Diffusion operators to compare.

  • times (tuple of int, default=(1,2,4,8)) – Diffusion times; we build Φ_t for each and average the R^2.

  • r (int, default=64) – Number of eigenpairs for coordinates.

  • symmetric_hint (bool, default=False) – See _top_eigs_of_P.

  • center (bool, default=True) – Mean-center coordinates before Procrustes.

Returns:

R2_avg (float) – Average coefficient of determination across t (clipped to [0,1]). Higher is better (1.0 means perfect alignment up to a rotation).

topo.eval.topo_metrics.diffusion_rank_biased_overlap(Px, Py, times=(1, 2, 4, 8), r=64, p=0.9, k_max=100, symmetric_hint=False)

Local geometry agreement via Rank-Biased Overlap (RBO) of diffusion neighbors.

Parameters:
  • Px ((n, n) csr_matrix or ndarray) – Diffusion operators to compare.

  • Py ((n, n) csr_matrix or ndarray) – Diffusion operators to compare.

  • times (tuple of int, default=(1,2,4,8)) – Diffusion timescales; the score is averaged across them.

  • r (int, default=64) – Number of leading eigenpairs to use for diffusion distances.

  • p (float in (0,1), default=0.9) – Persistence parameter of RBO. Values closer to 1 give more weight to deeper ranks; smaller values emphasize the very top neighbors.

  • k_max (int, default=100) – Maximum depth of neighbor lists to compare. Truncates infinite RBO.

  • symmetric_hint (bool, default=False) – See diffusion_eigs.

Returns:

score (float in [0,1]) – Average RBO similarity across timescales. 1.0 means identical ranked neighbor lists under diffusion distances.

Notes

  • RBO compares ordered neighbor lists, unlike kNN overlap which ignores order.

  • Especially useful to assess whether diffusion rankings (top-100 neighbors) are preserved, not just set membership.

  • Parameter p controls top-heaviness: p=0.9 gives ≈86% of weight to top-10.

topo.eval.topo_metrics.rowwise_js_similarity(Px, Py, eps: float = 1e-12, topk: int = None, return_per_row: bool = False)

Row-wise Jensen–Shannon (JS) similarity between two diffusion operators.

Given two (row-stochastic) operators Px and Py (csr_matrices or ndarrays), we compare each row i as a discrete probability distribution over columns (neighbors) and compute the Jensen–Shannon divergence JS(p_i, q_i). We then report a bounded similarity in [0, 1] via:

JS-similarity = 1 - mean_i JS(p_i, q_i)

where JS(·,·) = 0 for identical distributions and ≤ log(2) in nats; we use the standard normalized/base-e version implemented below.

Parameters:
  • Px ((n, n) csr_matrix or ndarray) – Row-stochastic diffusion/transition operators to compare. (They need not be strictly stochastic; rows are renormalized internally.)

  • Py ((n, n) csr_matrix or ndarray) – Row-stochastic diffusion/transition operators to compare. (They need not be strictly stochastic; rows are renormalized internally.)

  • eps (float, default=1e-12) – Small positive value added before per-row renormalization for numerical stability.

  • topk (int or None, default=None) – If provided, restrict each row to its top-k entries by probability mass before comparing (helps robustness on very sparse / noisy rows). If None, we compare using the full sparse support (union of supports).

  • return_per_row (bool, default=False) – If True, also return the per-row JS similarities (1 - JS_i) as a 1D array.

Returns:

  • sim (float in [0, 1]) – 1 - mean(JS divergence) across rows (higher is better).

  • per_row (ndarray, optional) – Returned only if return_per_row=True. Per-row (1 - JS_i) scores.

Notes

  • Operates sparsely: for each row we build vectors over the union of the supports of Px[i, :] and Py[i, :] (unless topk is set, in which case supports are first truncated to top-k).

  • Sensitive to weights (transition probabilities), unlike set-overlap metrics.

  • If a row is empty in both operators, it is skipped.

topo.eval.topo_metrics.sparse_neighborhood_f1(Px, Py, k=None)

Operator-level set overlap: F1 of top-k transition neighborhoods per row.

Parameters:
  • Px ((n, n) csr_matrix or ndarray) – Diffusion/transition operators.

  • Py ((n, n) csr_matrix or ndarray) – Diffusion/transition operators.

  • k (int or None, default=None) – Number of neighbors (by transition probability) to keep per row. If None, use the natural sparsity (nnz per row) or min(nnz_x, nnz_y).

Returns:

f1_avg (float in [0, 1]) – Mean F1 score across rows. 1.0 → identical sparse neighborhoods.

Notes

  • Build per-row sets of top-k columns by probability mass, then compute F1 = 2 |∩| / (|Sx| + |Sy|).

  • Complements JS: insensitive to weights but sensitive to support overlap.

topo.eval.topo_metrics.spectral_similarity(Px, Py, r=64, symmetric_hint=False, return_details=False)

Operator-level spectral agreement via eigenvalues and subspaces.

Returns:

  • If return_details=False – float in [0,1]: cosine of the largest principal angle between the top-r eigenspaces (higher is better).

  • If return_details=True – dict with {‘eigenvalue_w1’, ‘subspace_cos’}.

topo.eval.topo_metrics.commute_time_trace_gap(Px, Py, r=64, symmetric_hint=False, hutchinson_probes=None, random_state=None)

Global connectivity gap via (approximate) trace of Laplacian pseudoinverse.

Parameters:
  • Px ((n, n) csr_matrix or ndarray) – Diffusion operators (will be symmetrized to build Laplacians).

  • Py ((n, n) csr_matrix or ndarray) – Diffusion operators (will be symmetrized to build Laplacians).

  • r (int, default=64) – If using low-rank approximation, number of leading modes for the trace.

  • symmetric_hint (bool, default=False) – See diffusion_eigs.

  • hutchinson_probes (int or None, default=None) – If provided, estimate trace via Hutchinson’s method with this many random probe vectors (for very large graphs).

  • random_state (int or np.random.RandomState or None) – RNG for Hutchinson probes.

Returns:

gap (float >= 0) – Absolute difference between (approximate) trace( L^+_x ) and trace( L^+_y ). Smaller is better (more similar commute-time geometry).

Notes

  • Build A = (P + P^T)/2, then normalized Laplacian L. Commute-time distances relate to entries of L^+; its trace summarizes overall connectivity.

  • For speed, you can approximate using low-rank spectral sums or Hutchinson.

topo.eval.topo_metrics.topo_preserve_score(Px, Py, times=(1, 2, 4, 8), r: int = 64, symmetric_hint: bool = False, k_for_pf1: int = None, weights: dict = dict(PF1=0.3, PJS=0.3, SP=0.3))

Composite TopoPreserve score using four operator-aware metrics (higher is better; returns ≈[0,1] after internal normalizations).

Components

  • PF1F1@k on top-k transition neighborhoods per row (set overlap).

    Range [0,1], higher is better. (Weight-insensitive.)

  • PJSRow-wise Jensen–Shannon similarity of transitions (1 − JS, normalized).

    Range [0,1], higher is better. (Weight-sensitive.)

  • SPSpectral Procrustes R^2 alignment of diffusion coordinates (average over times).

    Range [0,1], higher is better. (Global/meso geometry.)

param Px:

Diffusion (transition) operators to compare.

type Px:

(n, n) csr_matrix or ndarray

param Py:

Diffusion (transition) operators to compare.

type Py:

(n, n) csr_matrix or ndarray

param times:

Diffusion times for Spectral Procrustes. Ignored by the other components.

type times:

tuple of int, default=(1, 2, 4, 8)

param r:

Leading eigenpairs used for spectral metrics (SP internals).

type r:

int, default=64

param symmetric_hint:

If True, treat operators as symmetric for eigensolvers (stability hint).

type symmetric_hint:

bool, default=False

param k_for_pf1:

Top-k used in PF1. If None, each row uses its native sparsity.

type k_for_pf1:

int or None, default=None

param weights:

Mixture weights for the four components. Any NaN component is skipped and remaining weights are renormalized.

type weights:

dict, default={PF1=0.30, PJS=0.30, SP=0.30}

returns:
  • score (float) – Weighted average of the component scores in ≈[0,1] (higher is better).

  • parts (dict) –

    {

    ‘PF1’ : float in [0,1], ‘PJS’ : float in [0,1], ‘SP’ : float in [0,1],

    }

Notes

  • PF1 (set) and PJS (weight) together capture local neighborhood fidelity.

  • SP captures global/meso geometry via diffusion eigencoordinates.

  • All components are operator-native (diffusion/graph-based), aligning with TopoMAP/DM objectives more directly than raw Euclidean metrics.