TopOGraph

class topo.TopOGraph(base_knn=30, graph_knn=30, min_eigs=128, n_jobs=-1, projection_methods=['MAP', 'PaCMAP'], base_kernel=None, base_kernel_version='bw_adaptive', graph_kernel_version='bw_adaptive', base_metric='cosine', graph_metric='euclidean', diff_t=0, delta=1.0, sigma=0.1, low_memory=False, eigen_tol=1e-08, eigensolver='arpack', backend='hnswlib', cache=True, verbosity=0, random_state=42, id_method='fsa', id_ks=50, id_metric='euclidean', id_quantile=0.99, id_min_components=128, id_max_components=1024, id_headroom=0.5, uom=False, eigenmap_method=None, laplacian_type='normalized')

Geometry-aware estimator that learns spectral scaffolds, refined operators, and 2-D layouts.

TopOGraph builds the multiscale and single-time spectral scaffolds, reconstructs refined similarity graphs in scaffold space, and exposes ready-to-plot TopoMAP/TopoPaCMAP embeddings together with intrinsic dimensionality estimates. Legacy dictionaries remain available for benchmarking and combinatorial model searches.

Parameters:
  • base_knn (int, default 30) – k-nearest neighbors for the base graph on input space.

  • graph_knn (int, default 30) – k-nearest neighbors for the refined graph built in spectral scaffold space.

  • min_eigs (int, default 128) – Minimum number of eigenpairs to compute for the scaffold.

  • base_kernel (topo.tpgraph.Kernel or None, default None) – Pre-fitted kernel to reuse; if provided, fit skips base graph construction.

  • laplacian_type ({'unnormalized', 'normalized', 'random_walk', 'geometric'}, default 'normalized') – Laplacian normalization used for spectral computations.

  • base_kernel_version (str, default 'bw_adaptive') – Kernel choice for the base graph (e.g., ‘bw_adaptive’, ‘fuzzy’, ‘cknn’).

  • graph_kernel_version (str, default 'bw_adaptive') – Kernel choice for scaffold graphs (applies to DM and msDM).

  • backend ({'hnswlib', 'nmslib', 'annoy', 'faiss', 'sklearn'}, default 'hnswlib') – Approximate nearest-neighbor backend.

  • base_metric (str, default 'cosine') – Distance for the base kNN graph (usually cosine/correlation on standardized inputs).

  • graph_metric (str, default 'euclidean') – Distance for kNN in scaffold space.

  • diff_t (int, default 0) – Diffusion time used for the single-time scaffold; ignored for multiscale.

  • sigma (float, default 0.1) – Bandwidth for Gaussian kernels (when selected).

  • delta (float, default 1.0) – Radius parameter for cKNN kernels.

  • n_jobs (int, default 1) – Threads for kNN searches; -1 uses all cores.

  • low_memory (bool, default False) – Avoid caching large kernel objects when True.

  • eigen_tol (float, default 1e-8) – Tolerance passed to the eigen solver.

  • eigensolver ({'arpack', 'lobpcg', 'amg', 'dense'}, default 'arpack') – Solver used for eigendecomposition.

  • projection_methods (list[str], default ['MAP', 'PaCMAP']) – Layouts to compute when calling project.

  • cache (bool, default True) – Cache kernel and eigen objects in dictionaries for reuse.

  • verbosity (int, default 0) – 0: silent; 1: major steps; 2+: include layout messages; 3: debug neighborhoods.

  • random_state (int or numpy.random.RandomState, default 0) – Random seed/control for reproducibility.

  • id_method ({'mle', 'fsa'}, default 'fsa') – Intrinsic dimensionality estimator that selects scaffold size (both are stored).

  • id_ks (int or iterable, default 50) – Neighborhood sizes for I.D. estimation.

  • id_metric (str, default 'euclidean') – Metric used for I.D. estimation.

  • id_quantile (float, default 0.99) – Quantile for FSA-based I.D. estimation.

  • id_min_components (int, default 128) – Lower bound on scaffold components.

  • id_max_components (int, default 1024) – Upper bound on scaffold components.

  • id_headroom (float, default 0.5) – Extra fraction of components beyond the estimated I.D. to keep.

  • uom (bool, default False) – Enable unions-of-manifolds (block-diagonal scaffolds) if supported.

Variables:
  • knn_X (scipy.sparse.csr_matrix) – Base kNN graph on the input space.

  • P_of_X (scipy.sparse.csr_matrix) – Diffusion operator on the input space.

  • knn_Z (knn_msZ,) – kNN graphs on the multiscale and single-time spectral scaffolds.

  • P_of_Z (P_of_msZ,) – Refined diffusion operators on the multiscale and single-time scaffolds.

  • eigenvalues (numpy.ndarray) – Eigenvalues of the active eigenbasis (multiscale by default).

  • msPaCMAP (MAP, msMAP, PaCMAP,) – Ready-to-plot 2-D layouts computed on refined graphs.

  • ProjectionDict (BaseKernelDict, EigenbasisDict, GraphKernelDict,) – Legacy storage used for benchmarking and model selection.

eigenspectrum(eigenbasis_key=None, **kwargs)

Scree plot helper (calls topo.plot.decay_plot).

Behavior

  • UoM enabled: plots one scree per disconnected component using the active mode

(self._uom_active_mode, default ‘msDM’). Titles include component index and size. - Non-UoM: behaves as before, plotting the selected/global eigenbasis.

eval_models_layouts(X, landmarks=None, kernels=['cknn', 'bw_adaptive'], eigenmap_methods=['msDM', 'DM', 'LE'], projections=['MAP'], additional_eigenbases=None, additional_projections=None, landmark_method='random', n_neighbors=5, n_jobs=-1, cor_method='spearman', **kwargs)

Evaluate orthogonal bases, topological graphs and layouts against geodesic correlations and a PCA baseline. Kept for backward compatibility.

fit(X=None, **kwargs)

Build base kNN, base kernel P(X). Compute both msDM and DM eigenbases (dual scaffold). Optionally (uom=True), detect disconnected components and build per-component scaffolds and refined graphs; aggregate them into block-diagonal operators and concatenated coordinates with no cross-component edges.

list_eigenbases()

List keys in EigenbasisDict (legacy/benchmarking).

plot_eigenspectrum(eigenbasis_key=None, **kwargs)

Alias for eigenspectrum.

project(n_components=2, init=None, projection_method=None, landmarks=None, landmark_method='kmeans', n_neighbors=None, num_iters=300, multiscale=False, save_every=None, save_limit=None, save_callback=None, include_init_snapshot=True, **kwargs)

Compute a 2D projection and store it in ProjectionDict. In UoM mode, graph-based methods use UoM block-diagonal affinities; coordinate-based methods use UoM concatenated scaffolds. This guarantees zero cross-component edges.

Parameters:
  • n_components (int (default 2)) – Number of output dimensions.

  • init (np.ndarray or str (optional)) – Initial coordinates for layout optimization. If a string, must be a key in ProjectionDict. If None, spectral layout is used.

  • projection_method (str (optional, default 'Isomap').) –

    Which projection method to use. Only ‘Isomap’, ‘t-SNE’ and ‘MAP’ are implemented out of the box. ‘t-SNE’ uses and ‘MAP’ relies on code that is adapted from UMAP. Current options are:

    These are frankly quite direct to add, so feel free to make a feature request if your favorite method is not listed here.

  • landmarks (int or np.ndarray (optional)) – Number of landmarks or indices of landmark samples. If None, no landmarks are used.

  • landmark_method (str (default 'kmeans')) – Landmark selection method (if landmarks is an int). One of {‘random’, ‘kmeans’).

  • n_neighbors (int (optional)) – Number of neighbors for graph-based methods. If None, uses self.graph_knn.

  • num_iters (int (default 300)) – Number of optimization epochs for layout optimization.

  • multiscale (bool (internal, default False)) – If True, use msDM refined graph; else use DM refined graph.

  • save_every – Passed through to MAP checkpointing. Ignored by other methods.

  • save_limit – Passed through to MAP checkpointing. Ignored by other methods.

  • save_callback – Passed through to MAP checkpointing. Ignored by other methods.

  • include_init_snapshot – Passed through to MAP checkpointing. Ignored by other methods.

Notes

For graph-based DR methods we pass precomputed affinities from the chosen refined graph: {MAP, UMAP, Isomap, (Iso/Isomorphic)MDE, PaCMAP, NCVis, TriMAP, t-SNE}.

run_models(X, kernels=['fuzzy', 'cknn', 'bw_adaptive'], eigenmap_methods=['DM', 'LE', 'top'], projections=['Isomap', 'MAP'])

Legacy power function that runs multiple models for benchmarking. Preserved for backward compatibility.

spectral_layout(graph=None, n_components=2)

Multicomponent spectral layout of a (precomputed) graph kernel. Stores result in SpecLayout and returns it (used for layout init). In UoM mode, defaults to the UoM msZ operator if no graph is provided.

transform(X=None, **kwargs)

DEPRECATED: all computations now happen during .fit(). Returns the msDM graph kernel matrix for backward compatibility.

write_pkl(filename='topograph.pkl', remove_base_class=True)

Save the TopOGraph object to a pickle file (legacy helper).