TopOGraph
- class topo.TopOGraph(base_knn=30, graph_knn=30, min_eigs=128, n_jobs=-1, projection_methods=['MAP', 'PaCMAP'], base_kernel=None, base_kernel_version='bw_adaptive', graph_kernel_version='bw_adaptive', base_metric='cosine', graph_metric='euclidean', diff_t=0, delta=1.0, sigma=0.1, low_memory=False, eigen_tol=1e-08, eigensolver='arpack', backend='hnswlib', cache=True, verbosity=0, random_state=42, id_method='fsa', id_ks=50, id_metric='euclidean', id_quantile=0.99, id_min_components=128, id_max_components=1024, id_headroom=0.5, uom=False, eigenmap_method=None, laplacian_type='normalized')
Geometry-aware estimator that learns spectral scaffolds, refined operators, and 2-D layouts.
TopOGraph builds the multiscale and single-time spectral scaffolds, reconstructs refined similarity graphs in scaffold space, and exposes ready-to-plot TopoMAP/TopoPaCMAP embeddings together with intrinsic dimensionality estimates. Legacy dictionaries remain available for benchmarking and combinatorial model searches.
- Parameters:
base_knn (int, default 30) – k-nearest neighbors for the base graph on input space.
graph_knn (int, default 30) – k-nearest neighbors for the refined graph built in spectral scaffold space.
min_eigs (int, default 128) – Minimum number of eigenpairs to compute for the scaffold.
base_kernel (topo.tpgraph.Kernel or None, default None) – Pre-fitted kernel to reuse; if provided, fit skips base graph construction.
laplacian_type ({'unnormalized', 'normalized', 'random_walk', 'geometric'}, default 'normalized') – Laplacian normalization used for spectral computations.
base_kernel_version (str, default 'bw_adaptive') – Kernel choice for the base graph (e.g., ‘bw_adaptive’, ‘fuzzy’, ‘cknn’).
graph_kernel_version (str, default 'bw_adaptive') – Kernel choice for scaffold graphs (applies to DM and msDM).
backend ({'hnswlib', 'nmslib', 'annoy', 'faiss', 'sklearn'}, default 'hnswlib') – Approximate nearest-neighbor backend.
base_metric (str, default 'cosine') – Distance for the base kNN graph (usually cosine/correlation on standardized inputs).
graph_metric (str, default 'euclidean') – Distance for kNN in scaffold space.
diff_t (int, default 0) – Diffusion time used for the single-time scaffold; ignored for multiscale.
sigma (float, default 0.1) – Bandwidth for Gaussian kernels (when selected).
delta (float, default 1.0) – Radius parameter for cKNN kernels.
n_jobs (int, default 1) – Threads for kNN searches; -1 uses all cores.
low_memory (bool, default False) – Avoid caching large kernel objects when True.
eigen_tol (float, default 1e-8) – Tolerance passed to the eigen solver.
eigensolver ({'arpack', 'lobpcg', 'amg', 'dense'}, default 'arpack') – Solver used for eigendecomposition.
projection_methods (list[str], default ['MAP', 'PaCMAP']) – Layouts to compute when calling project.
cache (bool, default True) – Cache kernel and eigen objects in dictionaries for reuse.
verbosity (int, default 0) – 0: silent; 1: major steps; 2+: include layout messages; 3: debug neighborhoods.
random_state (int or numpy.random.RandomState, default 0) – Random seed/control for reproducibility.
id_method ({'mle', 'fsa'}, default 'fsa') – Intrinsic dimensionality estimator that selects scaffold size (both are stored).
id_ks (int or iterable, default 50) – Neighborhood sizes for I.D. estimation.
id_metric (str, default 'euclidean') – Metric used for I.D. estimation.
id_quantile (float, default 0.99) – Quantile for FSA-based I.D. estimation.
id_min_components (int, default 128) – Lower bound on scaffold components.
id_max_components (int, default 1024) – Upper bound on scaffold components.
id_headroom (float, default 0.5) – Extra fraction of components beyond the estimated I.D. to keep.
uom (bool, default False) – Enable unions-of-manifolds (block-diagonal scaffolds) if supported.
- Variables:
knn_X (scipy.sparse.csr_matrix) – Base kNN graph on the input space.
P_of_X (scipy.sparse.csr_matrix) – Diffusion operator on the input space.
knn_Z (knn_msZ,) – kNN graphs on the multiscale and single-time spectral scaffolds.
P_of_Z (P_of_msZ,) – Refined diffusion operators on the multiscale and single-time scaffolds.
eigenvalues (numpy.ndarray) – Eigenvalues of the active eigenbasis (multiscale by default).
msPaCMAP (MAP, msMAP, PaCMAP,) – Ready-to-plot 2-D layouts computed on refined graphs.
ProjectionDict (BaseKernelDict, EigenbasisDict, GraphKernelDict,) – Legacy storage used for benchmarking and model selection.
- eigenspectrum(eigenbasis_key=None, **kwargs)
Scree plot helper (calls topo.plot.decay_plot).
Behavior
UoM enabled: plots one scree per disconnected component using the active mode
(self._uom_active_mode, default ‘msDM’). Titles include component index and size. - Non-UoM: behaves as before, plotting the selected/global eigenbasis.
- eval_models_layouts(X, landmarks=None, kernels=['cknn', 'bw_adaptive'], eigenmap_methods=['msDM', 'DM', 'LE'], projections=['MAP'], additional_eigenbases=None, additional_projections=None, landmark_method='random', n_neighbors=5, n_jobs=-1, cor_method='spearman', **kwargs)
Evaluate orthogonal bases, topological graphs and layouts against geodesic correlations and a PCA baseline. Kept for backward compatibility.
- fit(X=None, **kwargs)
Build base kNN, base kernel P(X). Compute both msDM and DM eigenbases (dual scaffold). Optionally (uom=True), detect disconnected components and build per-component scaffolds and refined graphs; aggregate them into block-diagonal operators and concatenated coordinates with no cross-component edges.
- list_eigenbases()
List keys in EigenbasisDict (legacy/benchmarking).
- plot_eigenspectrum(eigenbasis_key=None, **kwargs)
Alias for eigenspectrum.
- project(n_components=2, init=None, projection_method=None, landmarks=None, landmark_method='kmeans', n_neighbors=None, num_iters=300, multiscale=False, save_every=None, save_limit=None, save_callback=None, include_init_snapshot=True, **kwargs)
Compute a 2D projection and store it in ProjectionDict. In UoM mode, graph-based methods use UoM block-diagonal affinities; coordinate-based methods use UoM concatenated scaffolds. This guarantees zero cross-component edges.
- Parameters:
n_components (int (default 2)) – Number of output dimensions.
init (np.ndarray or str (optional)) – Initial coordinates for layout optimization. If a string, must be a key in ProjectionDict. If None, spectral layout is used.
projection_method (str (optional, default 'Isomap').) –
Which projection method to use. Only ‘Isomap’, ‘t-SNE’ and ‘MAP’ are implemented out of the box. ‘t-SNE’ uses and ‘MAP’ relies on code that is adapted from UMAP. Current options are:
’Isomap’ - one of the first manifold learning methods
[‘t-SNE’](https://github.com/DmitryUlyanov/Multicore-TSNE) - a classic manifold learning method
’MAP’- a lighter [UMAP](https://umap-learn.readthedocs.io/en/latest/index.html) with looser assumptions
[‘UMAP’](https://umap-learn.readthedocs.io/en/latest/index.html)
[‘PaCMAP’](http://jmlr.org/papers/v22/20-1061.html) (Pairwise-controlled Manifold Approximation and Projection) - for balanced visualizations
[‘TriMAP’](https://github.com/eamid/trimap) - dimensionality reduction using triplets
’IsomorphicMDE’ - [MDE](https://github.com/cvxgrp/pymde) with preservation of nearest neighbors
’IsometricMDE’ - [MDE](https://github.com/cvxgrp/pymde) with preservation of pairwise distances
’NCVis’ - [Noise Contrastive Visualization](https://github.com/stat-ml/ncvis) - a UMAP-like method with blazing fast performance
These are frankly quite direct to add, so feel free to make a feature request if your favorite method is not listed here.
landmarks (int or np.ndarray (optional)) – Number of landmarks or indices of landmark samples. If None, no landmarks are used.
landmark_method (str (default 'kmeans')) – Landmark selection method (if landmarks is an int). One of {‘random’, ‘kmeans’).
n_neighbors (int (optional)) – Number of neighbors for graph-based methods. If None, uses self.graph_knn.
num_iters (int (default 300)) – Number of optimization epochs for layout optimization.
multiscale (bool (internal, default False)) – If True, use msDM refined graph; else use DM refined graph.
save_every – Passed through to MAP checkpointing. Ignored by other methods.
save_limit – Passed through to MAP checkpointing. Ignored by other methods.
save_callback – Passed through to MAP checkpointing. Ignored by other methods.
include_init_snapshot – Passed through to MAP checkpointing. Ignored by other methods.
Notes
For graph-based DR methods we pass precomputed affinities from the chosen refined graph: {MAP, UMAP, Isomap, (Iso/Isomorphic)MDE, PaCMAP, NCVis, TriMAP, t-SNE}.
- run_models(X, kernels=['fuzzy', 'cknn', 'bw_adaptive'], eigenmap_methods=['DM', 'LE', 'top'], projections=['Isomap', 'MAP'])
Legacy power function that runs multiple models for benchmarking. Preserved for backward compatibility.
- spectral_layout(graph=None, n_components=2)
Multicomponent spectral layout of a (precomputed) graph kernel. Stores result in SpecLayout and returns it (used for layout init). In UoM mode, defaults to the UoM msZ operator if no graph is provided.
- transform(X=None, **kwargs)
DEPRECATED: all computations now happen during .fit(). Returns the msDM graph kernel matrix for backward compatibility.
- write_pkl(filename='topograph.pkl', remove_base_class=True)
Save the TopOGraph object to a pickle file (legacy helper).