topo.spectral

Submodules

Package Contents

Functions

graph_laplacian(W[, laplacian_type, return_D])

Compute the graph Laplacian, given a adjacency or affinity graph W. For a friendly reference,

diffusion_operator(W[, alpha, symmetric, semi_aniso, ...])

Computes the [diffusion operator](https://doi.org/10.1016/j.acha.2006.04.006).

DM(W[, n_eigs, alpha, return_evals, symmetric, ...])

Performs [Diffusion Maps](https://doi.org/10.1016/j.acha.2006.04.006), given a adjacency or affinity graph W.

LE(W[, n_eigs, laplacian_type, drop_first, ...])

Performs [Laplacian Eigenmaps](https://www2.imm.dtu.dk/projects/manifold/Papers/Laplacian.pdf), given a adjacency or affinity graph W.

degree(W)

optimize_layout_euclidean(head_embedding, ...[, ...])

Improve an embedding using stochastic gradient descent to minimize the

optimize_layout_generic(head_embedding, ...[, gamma, ...])

Improve an embedding using stochastic gradient descent to minimize the

optimize_layout_inverse(head_embedding, ...[, gamma, ...])

Improve an embedding using stochastic gradient descent to minimize the

optimize_layout_aligned_euclidean(head_embeddings, ...)

topo.spectral.graph_laplacian(W, laplacian_type='normalized', return_D=False)

Compute the graph Laplacian, given a adjacency or affinity graph W. For a friendly reference, see this material from James Melville: https://jlmelville.github.io/smallvis/spectral.html

Parameters:
  • W (scipy.sparse.csr_matrix or np.ndarray) – The graph adjacency or affinity matrix. Assumed to be symmetric and with zero diagonal. No further symmetrization is performed, so make sure to symmetrize W if necessary (usually done additively with W = (W + W.T)/2 ).

  • laplacian (str (optional, default 'random_walk').) – The type of laplacian to use. Can be ‘unnormalized’, ‘normalized’ or ‘random_walk’.

  • return_D (bool (optional, default False).) – Whether to also return a degree matrix with the Laplacian in a tuple

Returns:

L (scipy.sparse.csr_matrix) – The graph Laplacian.

topo.spectral.diffusion_operator(W, alpha=1.0, symmetric=False, semi_aniso=False, return_D_inv_sqrt=False)

Computes the [diffusion operator](https://doi.org/10.1016/j.acha.2006.04.006).

Parameters:
  • W (scipy.sparse.csr_matrix or np.ndarray) – The graph adjacency or affinity matrix. Assumed to be symmetric and with zero diagonal. No further symmetrization is performed, so make sure to symmetrize W if necessary (usually done additively with W = (W + W.T)/2 ).

  • alpha (float (optional, default 1.0).) – Anisotropy to apply. ‘Alpha’ in the diffusion maps literature.

  • symmetric (bool (optional, default True).) – Whether to use a symmetric version of the diffusion operator. This is particularly useful to yield a symmetric operator when using anisotropy (alpha > 0), as the diffusion operator P would be assymetric otherwise, which can be problematic during matrix decomposition. Eigenvalues are the same of the assymetric version, and the eigenvectors of the original assymetric operator can be obtained by left multiplying by D_inv_sqrt (returned if return_D_inv_sqrt set to True).

  • semi_aniso (bool (optional, default False).) – Whether to use semi-anisotropic diffusion. This reweights the original kernel (not the renormalized kernel) by the renormalized degree.

  • return_D_inv_sqrt (bool (optional, default False).) – Whether to return a tuple of diffusion operator P and inverse square root of the degree matrix.

Returns:

P (scipy.sparse.csr_matrix) – The graph diffusion operator.

topo.spectral.DM(W, n_eigs=10, alpha=0, return_evals=False, symmetric=False, eigen_tol=0.001, t=None)

Performs [Diffusion Maps](https://doi.org/10.1016/j.acha.2006.04.006), given a adjacency or affinity graph W. The graph W can be a sparse matrix or a dense matrix. It is assumed to be symmetric (no further symmetrization is performed, be sure it is), and with zero diagonal (all diagonal elements are 0). The eigenvectors associated with the largest eigenvalues form a new orthonormal basis which represents the graph in the feature space and are useful for denoising and clustering.

Parameters:
  • W (scipy.sparse.csr_matrix or np.ndarray) – The graph adjacency or affinity matrix. Assumed to be symmetric and with zero diagonal.

  • n_eigs (int (optional, default 10).) – The number of eigenvectors to return.

  • alpha (float (optional, default 0).) – Anisotropy to be applied to the diffusion map. Refered to as alpha in the diffusion maps literature.

  • return_evals (bool (optional, default False).) – Whether to return the eigenvalues. If True, returns a tuple of (eigenvectors, eigenvalues).

  • symmetric (bool (optional, default False).) – Whether to use a symmetric version of the diffusion operator. This is particularly useful to yield a symmetric operator, but that can also be obtained by a simplistic mean with its the transpose with near identical results.

  • eigen_tol (float (optional, default 0).) – The tolerance for the eigendecomposition in scipy.sparse.linalg.eigsh().

  • t (int (optional, default 1).) – The number of steps to take in the diffusion map.

Returns:

  • * If return_evals is True – A tuple of scaled eigenvectors (the diffusion maps) and eigenvalues.

  • * If return_evals is False – An array of scaled eigenvectors (the diffusion maps).

topo.spectral.LE(W, n_eigs=10, laplacian_type='random_walk', drop_first=True, return_evals=False, eigen_tol=0, random_state=None)

Performs [Laplacian Eigenmaps](https://www2.imm.dtu.dk/projects/manifold/Papers/Laplacian.pdf), given a adjacency or affinity graph W. The graph W can be a sparse matrix or a dense matrix. It is assumed to be symmetric (no further symmetrization is performed, be sure it is), and with zero diagonal (all diagonal elements are 0). The eigenvectors associated with the smallest eigenvalues form a new orthonormal basis which represents the graph in the feature space and are useful for denoising and clustering.

Parameters:
  • W (scipy.sparse.csr_matrix or np.ndarray) – The graph adjacency or affinity matrix. Assumed to be symmetric and with zero diagonal.

  • n_eigs (int (optional, default 10).) – The number of eigenvectors to compute.

  • laplacian (str (optional, default 'random_walk').) – The type of laplacian to use. Can be ‘unnormalized’, ‘symmetric’, or ‘random_walk’.

  • drop_first (bool (optional, default True).) – Whether to drop the first eigenvector.

  • return_evals (bool (optional, default False).) – Whether to return the eigenvalues. If True, returns a tuple of (eigenvectors, eigenvalues).

  • eigen_tol (float (optional, default 0).) – The tolerance for the eigendecomposition.

  • random_state (int (optional, default None).) – The random state for the eigendecomposition in scipy.sparse.linalg.lobpcg() if the data has more than a million samples.

Returns:

  • evecs (np.ndarray of shape (W.shape[0], n_eigs)) – The eigenvectors of the graph Laplacian, sorted by ascending eigenvalues.

  • If return_evals – evecs, evals : tuple of ndarrays The eigenvectors and associated eigenvalues, sorted by ascending eigenvalues.

topo.spectral.degree(W)
topo.spectral.optimize_layout_euclidean(head_embedding, tail_embedding, head, tail, n_epochs, n_vertices, epochs_per_sample, a, b, rng_state, gamma=1.0, initial_alpha=1.0, negative_sample_rate=5.0, parallel=False, verbose=False, densmap=False, densmap_kwds={})

Improve an embedding using stochastic gradient descent to minimize the fuzzy set cross entropy between the 1-skeletons of the high dimensional and low dimensional fuzzy simplicial sets. In practice this is done by sampling edges based on their membership strength (with the (1-p) terms coming from negative sampling similar to word2vec). :param head_embedding: The initial embedding to be improved by SGD. :type head_embedding: array of shape (n_samples, n_components) :param tail_embedding: The reference embedding of embedded points. If not embedding new

previously unseen points with respect to an existing embedding this is simply the head_embedding (again); otherwise it provides the existing embedding to embed with respect to.

Parameters:
  • head (array of shape (n_1_simplices)) – The indices of the heads of 1-simplices with non-zero membership.

  • tail (array of shape (n_1_simplices)) – The indices of the tails of 1-simplices with non-zero membership.

  • n_epochs (int) – The number of training epochs to use in optimization.

  • n_vertices (int) – The number of vertices (0-simplices) in the dataset.

  • epochs_per_samples (array of shape (n_1_simplices)) – A float value of the number of epochs per 1-simplex. 1-simplices with weaker membership strength will have more epochs between being sampled.

  • a (float) – Parameter of differentiable approximation of right adjoint functor

  • b (float) – Parameter of differentiable approximation of right adjoint functor

  • rng_state (array of int64, shape (3,)) – The internal state of the rng

  • gamma (float (optional, default 1.0)) – Weight to apply to negative samples.

  • initial_alpha (float (optional, default 1.0)) – Initial learning rate for the SGD.

  • negative_sample_rate (int (optional, default 5)) – Number of negative samples to use per positive sample.

  • parallel (bool (optional, default False)) – Whether to run the computation using numba parallel. Running in parallel is non-deterministic, and is not used if a random seed has been set, to ensure reproducibility.

  • verbose (bool (optional, default False)) – Whether to report information on the current progress of the algorithm.

  • densmap (bool (optional, default False)) – Whether to use the density-augmented densMAP objective

  • densmap_kwds (dict (optional, default {})) – Auxiliary data for densMAP

Returns:

embedding (array of shape (n_samples, n_components)) – The optimized embedding.

topo.spectral.optimize_layout_generic(head_embedding, tail_embedding, head, tail, n_epochs, n_vertices, epochs_per_sample, a, b, rng_state, gamma=1.0, initial_alpha=1.0, negative_sample_rate=5.0, output_metric=dist.euclidean, output_metric_kwds=(), verbose=False)

Improve an embedding using stochastic gradient descent to minimize the fuzzy set cross entropy between the 1-skeletons of the high dimensional and low dimensional fuzzy simplicial sets. In practice this is done by sampling edges based on their membership strength (with the (1-p) terms coming from negative sampling similar to word2vec). :param head_embedding: The initial embedding to be improved by SGD. :type head_embedding: array of shape (n_samples, n_components) :param tail_embedding: The reference embedding of embedded points. If not embedding new

previously unseen points with respect to an existing embedding this is simply the head_embedding (again); otherwise it provides the existing embedding to embed with respect to.

Parameters:
  • head (array of shape (n_1_simplices)) – The indices of the heads of 1-simplices with non-zero membership.

  • tail (array of shape (n_1_simplices)) – The indices of the tails of 1-simplices with non-zero membership.

  • weight (array of shape (n_1_simplices)) – The membership weights of the 1-simplices.

  • n_epochs (int) – The number of training epochs to use in optimization.

  • n_vertices (int) – The number of vertices (0-simplices) in the dataset.

  • epochs_per_sample (array of shape (n_1_simplices)) – A float value of the number of epochs per 1-simplex. 1-simplices with weaker membership strength will have more epochs between being sampled.

  • a (float) – Parameter of differentiable approximation of right adjoint functor

  • b (float) – Parameter of differentiable approximation of right adjoint functor

  • rng_state (array of int64, shape (3,)) – The internal state of the rng

  • gamma (float (optional, default 1.0)) – Weight to apply to negative samples.

  • initial_alpha (float (optional, default 1.0)) – Initial learning rate for the SGD.

  • negative_sample_rate (int (optional, default 5)) – Number of negative samples to use per positive sample.

  • verbose (bool (optional, default False)) – Whether to report information on the current progress of the algorithm.

Returns:

embedding (array of shape (n_samples, n_components)) – The optimized embedding.

topo.spectral.optimize_layout_inverse(head_embedding, tail_embedding, head, tail, weight, sigmas, rhos, n_epochs, n_vertices, epochs_per_sample, a, b, rng_state, gamma=1.0, initial_alpha=1.0, negative_sample_rate=5.0, output_metric=dist.euclidean, output_metric_kwds=(), verbose=False)

Improve an embedding using stochastic gradient descent to minimize the fuzzy set cross entropy between the 1-skeletons of the high dimensional and low dimensional fuzzy simplicial sets. In practice this is done by sampling edges based on their membership strength (with the (1-p) terms coming from negative sampling similar to word2vec). :param head_embedding: The initial embedding to be improved by SGD. :type head_embedding: array of shape (n_samples, n_components) :param tail_embedding: The reference embedding of embedded points. If not embedding new

previously unseen points with respect to an existing embedding this is simply the head_embedding (again); otherwise it provides the existing embedding to embed with respect to.

Parameters:
  • head (array of shape (n_1_simplices)) – The indices of the heads of 1-simplices with non-zero membership.

  • tail (array of shape (n_1_simplices)) – The indices of the tails of 1-simplices with non-zero membership.

  • weight (array of shape (n_1_simplices)) – The membership weights of the 1-simplices.

  • n_epochs (int) – The number of training epochs to use in optimization.

  • n_vertices (int) – The number of vertices (0-simplices) in the dataset.

  • epochs_per_sample (array of shape (n_1_simplices)) – A float value of the number of epochs per 1-simplex. 1-simplices with weaker membership strength will have more epochs between being sampled.

  • a (float) – Parameter of differentiable approximation of right adjoint functor

  • b (float) – Parameter of differentiable approximation of right adjoint functor

  • rng_state (array of int64, shape (3,)) – The internal state of the rng

  • gamma (float (optional, default 1.0)) – Weight to apply to negative samples.

  • initial_alpha (float (optional, default 1.0)) – Initial learning rate for the SGD.

  • negative_sample_rate (int (optional, default 5)) – Number of negative samples to use per positive sample.

  • verbose (bool (optional, default False)) – Whether to report information on the current progress of the algorithm.

Returns:

embedding (array of shape (n_samples, n_components)) – The optimized embedding.

topo.spectral.optimize_layout_aligned_euclidean(head_embeddings, tail_embeddings, heads, tails, n_epochs, epochs_per_sample, regularisation_weights, relations, rng_state, a=1.576943460405378, b=0.8950608781227859, gamma=1.0, lambda_=0.005, initial_alpha=1.0, negative_sample_rate=5.0, parallel=True, verbose=False)