topo.pipes

Module Contents

Functions

global_score(data, emb[, Y_pca])

eval_models_layouts(TopOGraph, X[, methods, kernels, ...])

Evaluates all orthogonal bases, topological graphs and layouts in the TopOGraph object.

explained_variance(X[, title, n_pcs, figsize, ...])

Plots the explained variance by PCA with varying number of highly variable genes.

topo.pipes.global_score(data, emb, Y_pca=False)
topo.pipes.eval_models_layouts(TopOGraph, X, methods=['tw', 'gc', 'gs'], kernels=['cknn', 'bw_adaptive'], eigenmap_methods=['DM', 'LE'], projections=['MAP'], additional_eigenbases=None, additional_projections=None, n_neighbors=5, n_jobs=-1, landmark_method='kmeans', metric='euclidean', n_pcs=30, landmarks=None, run_uncomputed_models=True, **kwargs)

Evaluates all orthogonal bases, topological graphs and layouts in the TopOGraph object.

Currently uses three different quality metrics: trustworthiness (https://scikit-learn.org/stable/modules/generated/sklearn.manifold.trustworthiness.html), geodesic correlation (defined in the TopOMetry manuscript as the Spearman R correlation between high- and low-dimensional geodesic distances), and global score (defined in the TriMAP paper as the MRE normalized by PCA’s MRE).

Parameters:
  • TopOGraph (target TopOGraph object (can be empty).) –

  • X (data matrix. Expects either numpy.ndarray or scipy.sparse.csr_matrix.) –

  • methods (list of str (optional, default ['tw', 'gc', 'gs']).) – Methods to use in the evaluation. Options are ‘tw’ (trustworthiness), ‘gc’ (geodesic correlation), and ‘gs’ (global score). gc is computationally expensive, so it is recommended to use a small number of landmarks (landmarks) or not use it at all. Take in mind gc is intrinsically related to ISOMAP and distance-preservation methods.

  • kernels (list of str (optional, default ['bw_adaptive']).) – List of kernel versions to run and evaluate. These will be used to learn an eigenbasis and to learn a new graph kernel from it. Options are: * ‘fuzzy’ * ‘cknn’ * ‘bw_adaptive’ * ‘bw_adaptive_alpha_decaying’ * ‘bw_adaptive_nbr_expansion’ * ‘bw_adaptive_alpha_decaying_nbr_expansion’ * ‘gaussian’ Will not run all by default to avoid long waiting times in reckless calls.

  • eigenmap_methods (list of str (optional, default ['DM', 'LE', 'top']).) – List of eigenmap methods to run and evaluate. Options are: * ‘DM’ * ‘LE’ * ‘top’ * ‘bottom’

  • projections (list of str (optional, default ['MAP']).) – List of projection methods to run and evaluate. Options are the same of the topo.layouts.Projector() object: * ‘(L)Isomap’ * ‘t-SNE’ * ‘MAP’ * ‘UMAP’ * ‘PaCMAP’ * ‘TriMAP’ * ‘IsomorphicMDE’ - MDE with preservation of nearest neighbors * ‘IsometricMDE’ - MDE with preservation of pairwise distances * ‘NCVis’

  • additional_eigenbases (dict (optional, default None).) – Dictionary containing named additional eigenbases (e.g. factor analysis, AE’s latent layer, ICA, etc) to be evaluated.

  • additional_projections (dict (optional, default None).) – Dictionary containing named additional projections (e.g. t-SNE, UMAP, etc) to be evaluated.

  • n_neighbors (int (optional, default 5).) – Number of nearest neighbors to use for the kNN graph.

  • n_jobs (int (optional, default -1).) – Number of jobs to use for parallelization. If -1, uses all available cores.

  • landmarks (optional (int, default None).) – If specified, subsamples the TopOGraph object and/or data matrix X to a number of landmark samples before computing results and scores. Useful if dealing with large datasets (>30,000 samples).

  • landmark_method (str (optional, default 'random').) – Method to use for landmark selection. Options are ‘random’ and ‘kmeans’.

  • kwargs (dict (optional, default {}).) – Additional keyword arguments to pass to the topo.base.ann.kNN() function.

Returns:

Populates the TopOGraph object and returns a dictionary of dictionaries with the results

topo.pipes.explained_variance(X, title='some data', n_pcs=200, figsize=(12, 6), sup_title_fontsize=20, title_fontsize=16, return_dicts=False)

Plots the explained variance by PCA with varying number of highly variable genes.

Parameters:
  • X (np.ndarray (2D) of observations per sample.) –

  • title (str (optional, default 'some data').) –

  • n_pcs (int (optional, default 200).) – Number of principal components to use.

  • figsize (tuple of int (optional, default (12,6)).) –

  • sup_title_fontsize (int (optional, default 20).) –

  • title_fontsize (int (optional, default 16).) –

  • return_dicts (bool (optional, default False).) – Whether to return explained covariance ratio and singular values dictionaries.

Returns:

  • A plot. If return_dicts=True, also returns a tuple of dictionaries (explained_cov_ratio, singular_values) with the keys

  • being strings with the number of genes and the values being the explained covariance ratio

  • and the singular values for PCA.