topo.eval.topo_metrics
Functions
|
|
|
Compute top-r eigenpairs of P (row-stochastic Markov operator). |
|
Phi_t(P) = [lambda_1^t psi_1, ..., lambda_r^t psi_r], optionally skipping the trivial first. |
|
Compute diffusion distance matrix (via truncated eigendecomposition). |
|
|
|
Convenience: build a diffusion operator P from data or a precomputed graph. |
|
Global geometry agreement via Spearman correlation of diffusion distances. |
|
Global geometry preservation via multiscale Earth Mover’s Distance (EMD) on diffusion distances. |
|
Align diffusion coordinates via orthogonal Procrustes and report R^2. |
|
Local geometry agreement via Rank-Biased Overlap (RBO) of diffusion neighbors. |
|
Row-wise Jensen–Shannon (JS) similarity between two diffusion operators. |
|
Operator-level set overlap: F1 of top-k transition neighborhoods per row. |
|
Operator-level spectral agreement via eigenvalues and subspaces. |
|
Global connectivity gap via (approximate) trace of Laplacian pseudoinverse. |
|
Composite TopoPreserve score using four operator-aware metrics |
Module Contents
- topo.eval.topo_metrics._ensure_csr(P)
- topo.eval.topo_metrics._top_eigs_of_P(P, r=64, which='LM', tol=0.0001, maxiter=None, v0=None, symmetric_hint=False)
Compute top-r eigenpairs of P (row-stochastic Markov operator). If you used a symmetric diffusion operator earlier, set symmetric_hint=True for improved stability (we then eigendecompose the symmetrized operator). Returns evals (r,), evecs (n,r). Sorted by |lambda| desc.
- topo.eval.topo_metrics.diffusion_coordinates(evals, evecs, t, drop_first=True, r_use=None, normalize_cols=True)
Phi_t(P) = [lambda_1^t psi_1, …, lambda_r^t psi_r], optionally skipping the trivial first.
- topo.eval.topo_metrics.diffusion_distance_from_eigs(evals, evecs, t, r_use=None, drop_first=True, squared=False)
Compute diffusion distance matrix (via truncated eigendecomposition). D^2(i,j) = sum_l lambda_l^{2t} (psi_l(i)-psi_l(j))^2 Returns a dense (n,n) matrix for convenience; for large n use sampling.
- topo.eval.topo_metrics._upper_triangle_vec(M)
- topo.eval.topo_metrics._topk_support_from_row(data, ind, k)
- topo.eval.topo_metrics.get_P(Y, **kwargs_for_kernel)
Convenience: build a diffusion operator P from data or a precomputed graph.
- Parameters:
Y (array-like or sparse matrix or topo.tpgraph.kernels.Kernel) –
If a Kernel instance: returns its .P (computing it if needed).
If a rectangular (n x d) array/matrix: treated as data; a kernel and diffusion operator are built using the provided kwargs.
If a square (n x n) array/sparse matrix AND metric=’precomputed’ is provided (or auto-detected), Y is treated as a precomputed affinity/ kernel matrix (NOT distances), and P is computed from it.
**kwargs_for_kernel –
- Passed to Kernel(…). Useful options include:
metric=’cosine’ | ‘euclidean’ | ‘precomputed’ n_neighbors=30 adaptive_bw=True backend=’nmslib’ | ‘hnswlib’ n_jobs=-1 symmetrize=True anisotropy=1.0 use_angular=True (for cosine)
- Returns:
P (scipy.sparse.csr_matrix) – The (symmetrized) diffusion operator.
Notes
If Y is square and you did NOT set metric=’precomputed’, we auto-switch to ‘precomputed’ (assuming Y is an affinity/kernel). If Y is a distance matrix, convert it to an affinity first or pass the raw data.
- topo.eval.topo_metrics.rank_diffusion_correlation(Px, Py, times=(1, 2, 4, 8), r=64, symmetric_hint=False)
Global geometry agreement via Spearman correlation of diffusion distances.
- Parameters:
Px ((n, n) csr_matrix or ndarray) – Diffusion operators to compare.
Py ((n, n) csr_matrix or ndarray) – Diffusion operators to compare.
times (tuple of int, default=(1,2,4,8)) – Multiscale t values. The score is averaged over t.
r (int, default=64) – Eigenpairs used for diffusion distances.
symmetric_hint (bool, default=False) – See diffusion_eigs.
- Returns:
rho_avg (float in [-1, 1]) – Average Spearman correlation between the upper triangles of D_t(Px) and D_t(Py) across t (higher is better).
Notes
Robust to monotone rescalings (rank-based).
Sensitive to global geometry preservation (coarse-to-fine as t grows).
- topo.eval.topo_metrics.multiscale_diffusion_emd(Px, Py, times=(1, 2, 4, 8), r=64, bins=64, symmetric_hint=False)
Global geometry preservation via multiscale Earth Mover’s Distance (EMD) on diffusion distances.
- Parameters:
Px ((n, n) csr_matrix or ndarray) – Diffusion operators to compare.
Py ((n, n) csr_matrix or ndarray) – Diffusion operators to compare.
times (tuple of int, default=(1,2,4,8)) – Diffusion timescales at which pairwise diffusion distances are computed. Scores are averaged across timescales.
r (int, default=64) – Number of leading eigenpairs to use for diffusion distances.
bins (int, default=64) – Number of histogram bins used to approximate the distribution of pairwise diffusion distances.
symmetric_hint (bool, default=False) – Passed to _top_eigs_of_P; set to True if Px and Py are symmetric operators.
- Returns:
emd (float ≥ 0) – Mean Earth Mover’s Distance (1-Wasserstein distance) between the distributions of pairwise diffusion distances in Px and Py, averaged across timescales. Lower is better (0 indicates identical distributions).
Notes
This is a distributional comparison: rather than matching each pairwise distance, it checks whether the global distribution of diffusion distances is preserved between Px and Py.
The measure is scale-sensitive: if distances in Py are systematically compressed or stretched relative to Px, the EMD will increase.
Bins are shared between Px and Py at each timescale to ensure fair histogram comparison.
Useful as a complement to correlation-based metrics (e.g. RDC), since it detects distributional distortions even if ranks are preserved.
- topo.eval.topo_metrics.spectral_procrustes(Px, Py, times=(1, 2, 4, 8), r=64, symmetric_hint=False, center=True)
Align diffusion coordinates via orthogonal Procrustes and report R^2.
- Parameters:
Px ((n, n) csr_matrix or ndarray) – Diffusion operators to compare.
Py ((n, n) csr_matrix or ndarray) – Diffusion operators to compare.
times (tuple of int, default=(1,2,4,8)) – Diffusion times; we build Φ_t for each and average the R^2.
r (int, default=64) – Number of eigenpairs for coordinates.
symmetric_hint (bool, default=False) – See _top_eigs_of_P.
center (bool, default=True) – Mean-center coordinates before Procrustes.
- Returns:
R2_avg (float) – Average coefficient of determination across t (clipped to [0,1]). Higher is better (1.0 means perfect alignment up to a rotation).
- topo.eval.topo_metrics.diffusion_rank_biased_overlap(Px, Py, times=(1, 2, 4, 8), r=64, p=0.9, k_max=100, symmetric_hint=False)
Local geometry agreement via Rank-Biased Overlap (RBO) of diffusion neighbors.
- Parameters:
Px ((n, n) csr_matrix or ndarray) – Diffusion operators to compare.
Py ((n, n) csr_matrix or ndarray) – Diffusion operators to compare.
times (tuple of int, default=(1,2,4,8)) – Diffusion timescales; the score is averaged across them.
r (int, default=64) – Number of leading eigenpairs to use for diffusion distances.
p (float in (0,1), default=0.9) – Persistence parameter of RBO. Values closer to 1 give more weight to deeper ranks; smaller values emphasize the very top neighbors.
k_max (int, default=100) – Maximum depth of neighbor lists to compare. Truncates infinite RBO.
symmetric_hint (bool, default=False) – See diffusion_eigs.
- Returns:
score (float in [0,1]) – Average RBO similarity across timescales. 1.0 means identical ranked neighbor lists under diffusion distances.
Notes
RBO compares ordered neighbor lists, unlike kNN overlap which ignores order.
Especially useful to assess whether diffusion rankings (top-100 neighbors) are preserved, not just set membership.
Parameter p controls top-heaviness: p=0.9 gives ≈86% of weight to top-10.
- topo.eval.topo_metrics.rowwise_js_similarity(Px, Py, eps: float = 1e-12, topk: int = None, return_per_row: bool = False)
Row-wise Jensen–Shannon (JS) similarity between two diffusion operators.
Given two (row-stochastic) operators Px and Py (csr_matrices or ndarrays), we compare each row i as a discrete probability distribution over columns (neighbors) and compute the Jensen–Shannon divergence JS(p_i, q_i). We then report a bounded similarity in [0, 1] via:
JS-similarity = 1 - mean_i JS(p_i, q_i)
where JS(·,·) = 0 for identical distributions and ≤ log(2) in nats; we use the standard normalized/base-e version implemented below.
- Parameters:
Px ((n, n) csr_matrix or ndarray) – Row-stochastic diffusion/transition operators to compare. (They need not be strictly stochastic; rows are renormalized internally.)
Py ((n, n) csr_matrix or ndarray) – Row-stochastic diffusion/transition operators to compare. (They need not be strictly stochastic; rows are renormalized internally.)
eps (float, default=1e-12) – Small positive value added before per-row renormalization for numerical stability.
topk (int or None, default=None) – If provided, restrict each row to its top-k entries by probability mass before comparing (helps robustness on very sparse / noisy rows). If None, we compare using the full sparse support (union of supports).
return_per_row (bool, default=False) – If True, also return the per-row JS similarities (1 - JS_i) as a 1D array.
- Returns:
sim (float in [0, 1]) – 1 - mean(JS divergence) across rows (higher is better).
per_row (ndarray, optional) – Returned only if return_per_row=True. Per-row (1 - JS_i) scores.
Notes
Operates sparsely: for each row we build vectors over the union of the supports of Px[i, :] and Py[i, :] (unless topk is set, in which case supports are first truncated to top-k).
Sensitive to weights (transition probabilities), unlike set-overlap metrics.
If a row is empty in both operators, it is skipped.
- topo.eval.topo_metrics.sparse_neighborhood_f1(Px, Py, k=None)
Operator-level set overlap: F1 of top-k transition neighborhoods per row.
- Parameters:
Px ((n, n) csr_matrix or ndarray) – Diffusion/transition operators.
Py ((n, n) csr_matrix or ndarray) – Diffusion/transition operators.
k (int or None, default=None) – Number of neighbors (by transition probability) to keep per row. If None, use the natural sparsity (nnz per row) or min(nnz_x, nnz_y).
- Returns:
f1_avg (float in [0, 1]) – Mean F1 score across rows. 1.0 → identical sparse neighborhoods.
Notes
- topo.eval.topo_metrics.spectral_similarity(Px, Py, r=64, symmetric_hint=False, return_details=False)
Operator-level spectral agreement via eigenvalues and subspaces.
- Returns:
If return_details=False – float in [0,1]: cosine of the largest principal angle between the top-r eigenspaces (higher is better).
If return_details=True – dict with {‘eigenvalue_w1’, ‘subspace_cos’}.
- topo.eval.topo_metrics.commute_time_trace_gap(Px, Py, r=64, symmetric_hint=False, hutchinson_probes=None, random_state=None)
Global connectivity gap via (approximate) trace of Laplacian pseudoinverse.
- Parameters:
Px ((n, n) csr_matrix or ndarray) – Diffusion operators (will be symmetrized to build Laplacians).
Py ((n, n) csr_matrix or ndarray) – Diffusion operators (will be symmetrized to build Laplacians).
r (int, default=64) – If using low-rank approximation, number of leading modes for the trace.
symmetric_hint (bool, default=False) – See diffusion_eigs.
hutchinson_probes (int or None, default=None) – If provided, estimate trace via Hutchinson’s method with this many random probe vectors (for very large graphs).
random_state (int or np.random.RandomState or None) – RNG for Hutchinson probes.
- Returns:
gap (float >= 0) – Absolute difference between (approximate) trace( L^+_x ) and trace( L^+_y ). Smaller is better (more similar commute-time geometry).
Notes
Build A = (P + P^T)/2, then normalized Laplacian L. Commute-time distances relate to entries of L^+; its trace summarizes overall connectivity.
For speed, you can approximate using low-rank spectral sums or Hutchinson.
- topo.eval.topo_metrics.topo_preserve_score(Px, Py, times=(1, 2, 4, 8), r: int = 64, symmetric_hint: bool = False, k_for_pf1: int = None, weights: dict = dict(PF1=0.3, PJS=0.3, SP=0.3))
Composite TopoPreserve score using four operator-aware metrics (higher is better; returns ≈[0,1] after internal normalizations).
Components
- PF1F1@k on top-k transition neighborhoods per row (set overlap).
Range [0,1], higher is better. (Weight-insensitive.)
- PJSRow-wise Jensen–Shannon similarity of transitions (1 − JS, normalized).
Range [0,1], higher is better. (Weight-sensitive.)
- SPSpectral Procrustes R^2 alignment of diffusion coordinates (average over times).
Range [0,1], higher is better. (Global/meso geometry.)
- param Px:
Diffusion (transition) operators to compare.
- type Px:
(n, n) csr_matrix or ndarray
- param Py:
Diffusion (transition) operators to compare.
- type Py:
(n, n) csr_matrix or ndarray
- param times:
Diffusion times for Spectral Procrustes. Ignored by the other components.
- type times:
tuple of int, default=(1, 2, 4, 8)
- param r:
Leading eigenpairs used for spectral metrics (SP internals).
- type r:
int, default=64
- param symmetric_hint:
If True, treat operators as symmetric for eigensolvers (stability hint).
- type symmetric_hint:
bool, default=False
- param k_for_pf1:
Top-k used in PF1. If None, each row uses its native sparsity.
- type k_for_pf1:
int or None, default=None
- param weights:
Mixture weights for the four components. Any NaN component is skipped and remaining weights are renormalized.
- type weights:
dict, default={PF1=0.30, PJS=0.30, SP=0.30}
- returns:
score (float) – Weighted average of the component scores in ≈[0,1] (higher is better).
parts (dict) –
- {
‘PF1’ : float in [0,1], ‘PJS’ : float in [0,1], ‘SP’ : float in [0,1],
}
Notes
PF1 (set) and PJS (weight) together capture local neighborhood fidelity.
SP captures global/meso geometry via diffusion eigencoordinates.
All components are operator-native (diffusion/graph-based), aligning with TopoMAP/DM objectives more directly than raw Euclidean metrics.