topo.layouts.map

Module Contents

Functions

fuzzy_embedding(graph[, n_components, initial_alpha, ...])

Perform a fuzzy simplicial set embedding, using a specified

topo.layouts.map.fuzzy_embedding(graph, n_components=2, initial_alpha=1, min_dist=0.3, spread=1, n_epochs=600, metric='cosine', metric_kwds={}, output_metric='euclidean', output_metric_kwds={}, gamma=1.0, negative_sample_rate=5, init='spectral', random_state=None, euclidean_output=True, parallel=True, verbose=False, a=None, b=None, densmap=False, densmap_kwds={}, output_dens=False)

Perform a fuzzy simplicial set embedding, using a specified initialisation method and then minimizing the fuzzy set cross entropy between the 1-skeletons of the high and low dimensional fuzzy simplicial sets. The fuzzy simplicial set embedding was proposed and implemented by Leland McInnes in UMAP (see umap-learn <https://github.com/lmcinnes/umap>). Here we’re using it only for the projection (layout optimization).

Parameters:
  • graph (sparse matrix) – The 1-skeleton of the high dimensional fuzzy simplicial set as represented by a graph for which we require a sparse matrix for the (weighted) adjacency matrix.

  • n_components (int) – The dimensionality of the euclidean space into which to embed the data.

  • initial_alpha (float) – Initial learning rate for the SGD.

  • min_dist (float (optional, default 0.3)) – The effective minimum distance between embedded points. Smaller values will result in a more clustered/clumped embedding where nearby points on the manifold are drawn closer together, while larger values will result on a more even dispersal of points. The value should be set relative to the spread value, which determines the scale at which embedded points will be spread out.

  • spread (float (optional, default 1.0)) – The effective scale of embedded points. In combination with min_dist this determines how clustered/clumped the embedded points are.

  • gamma (float) – Weight to apply to negative samples.

  • negative_sample_rate (int (optional, default 5)) – The number of negative samples to select per positive sample in the optimization process. Increasing this value will result in greater repulsive force being applied, greater optimization cost, but slightly more accuracy.

  • n_epochs (int (optional, default 0)) – The number of training epochs to be used in optimizing the low dimensional embedding. Larger values result in more accurate embeddings. If 0 is specified a value will be selected based on the size of the input dataset (200 for large datasets, 500 for small).

  • init (string) –

    How to initialize the low dimensional embedding. Options are:
    • ’spectral’: use a spectral embedding of the fuzzy 1-skeleton

    • ’random’: assign initial embedding positions at random.

    • A numpy array of initial embedding positions.

  • random_state (numpy RandomState or equivalent) – A state capable being used as a numpy random state.

  • metric (string or callable) – The metric used to measure distance in high dimensional space; used if multiple connected components need to be layed out.

  • metric_kwds (dict) – Key word arguments to be passed to the metric function; used if multiple connected components need to be layed out.

  • densmap (bool) – Whether to use the density-augmented objective function to optimize the embedding according to the densMAP algorithm.

  • densmap_kwds (dict) – Key word arguments to be used by the densMAP optimization.

  • output_dens (bool) – Whether to output local radii in the original data and the embedding.

  • output_metric (function) – Function returning the distance between two points in embedding space and the gradient of the distance wrt the first argument.

  • output_metric_kwds (dict) – Key word arguments to be passed to the output_metric function.

  • euclidean_output (bool) – Whether to use the faster code specialised for euclidean output metrics

  • parallel (bool (optional, default False)) – Whether to run the computation using numba parallel. Running in parallel is non-deterministic, and is not used if a random seed has been set, to ensure reproducibility.

  • verbose (bool (optional, default False)) – Whether to report information on the current progress of the algorithm.

  • a (float) – Parameter of differentiable approximation of right adjoint functor

  • b (float) – Parameter of differentiable approximation of right adjoint functor

Returns:

  • embedding (array of shape (n_samples, n_components)) – The optimized of graph into an n_components dimensional euclidean space.

  • aux_data (dict) – Auxiliary output returned with the embedding. When densMAP extension is turned on, this dictionary includes local radii in the original data (rad_orig) and in the embedding (rad_emb).

    Y_initarray of shape (n_samples, n_components)

    The spectral initialization of graph into an n_components dimensional euclidean space.