topo.layouts.graph_utils

Module Contents

Functions

make_epochs_per_sample(weights, n_epochs)

Given a set of weights and number of epochs generate the number of

simplicial_set_embedding(graph, n_components, ...[, ...])

Perform a fuzzy simplicial set embedding, using a specified

find_ab_params(spread, min_dist)

Fit a, b params for the differentiable curve used in lower

Attributes

INT32_MIN

INT32_MAX

topo.layouts.graph_utils.INT32_MIN
topo.layouts.graph_utils.INT32_MAX
topo.layouts.graph_utils.make_epochs_per_sample(weights, n_epochs)

Given a set of weights and number of epochs generate the number of epochs per sample for each weight. :param weights: The weights ofhow much we wish to sample each 1-simplex. :type weights: array of shape (n_1_simplices) :param n_epochs: The total number of epochs we want to train for. :type n_epochs: int

Returns:

An array of number of epochs per sample, one for each 1-simplex.

topo.layouts.graph_utils.simplicial_set_embedding(graph, n_components, initial_alpha, a, b, gamma, negative_sample_rate, n_epochs, init, random_state, metric, metric_kwds, densmap, densmap_kwds, output_dens, output_metric=dist.named_distances_with_gradients['euclidean'], output_metric_kwds={}, euclidean_output=True, parallel=True, verbose=False)

Perform a fuzzy simplicial set embedding, using a specified initialisation method and then minimizing the fuzzy set cross entropy between the 1-skeletons of the high and low dimensional fuzzy simplicial sets. :param graph: The 1-skeleton of the high dimensional fuzzy simplicial set as

represented by a graph for which we require a sparse matrix for the (weighted) adjacency matrix.

Parameters:
  • n_components (int) – The dimensionality of the euclidean space into which to embed the data.

  • initial_alpha (float) – Initial learning rate for the SGD.

  • a (float) – Parameter of differentiable approximation of right adjoint functor

  • b (float) – Parameter of differentiable approximation of right adjoint functor

  • gamma (float) – Weight to apply to negative samples.

  • negative_sample_rate (int (optional, default 5)) – The number of negative samples to select per positive sample in the optimization process. Increasing this value will result in greater repulsive force being applied, greater optimization cost, but slightly more accuracy.

  • n_epochs (int (optional, default 0)) – The number of training epochs to be used in optimizing the low dimensional embedding. Larger values result in more accurate embeddings. If 0 is specified a value will be selected based on the size of the input dataset (200 for large datasets, 500 for small).

  • init (string) –

    How to initialize the low dimensional embedding. Options are:
    • ’spectral’: use a spectral embedding of the fuzzy 1-skeleton

    • ’random’: assign initial embedding positions at random.

    • A numpy array of initial embedding positions.

  • random_state (numpy RandomState or equivalent) – A state capable being used as a numpy random state.

  • metric (string or callable) – The metric used to measure distance in high dimensional space; used if multiple connected components need to be layed out.

  • metric_kwds (dict) – Key word arguments to be passed to the metric function; used if multiple connected components need to be layed out.

  • densmap (bool) – Whether to use the density-augmented objective function to optimize the embedding according to the densMAP algorithm.

  • densmap_kwds (dict) – Key word arguments to be used by the densMAP optimization.

  • output_dens (bool) – Whether to output local radii in the original data and the embedding.

  • output_metric (function) – Function returning the distance between two points in embedding space and the gradient of the distance wrt the first argument.

  • output_metric_kwds (dict) – Key word arguments to be passed to the output_metric function.

  • euclidean_output (bool) – Whether to use the faster code specialised for euclidean output metrics

  • parallel (bool (optional, default False)) – Whether to run the computation using numba parallel. Running in parallel is non-deterministic, and is not used if a random seed has been set, to ensure reproducibility.

  • verbose (bool (optional, default False)) – Whether to report information on the current progress of the algorithm.

Returns:

  • embedding (array of shape (n_samples, n_components)) – The optimized of graph into an n_components dimensional euclidean space.

  • aux_data (dict) – Auxiliary output returned with the embedding. When densMAP extension is turned on, this dictionary includes local radii in the original data (rad_orig) and in the embedding (rad_emb).

topo.layouts.graph_utils.find_ab_params(spread, min_dist)

Fit a, b params for the differentiable curve used in lower dimensional fuzzy simplicial complex construction. We want the smooth curve (from a pre-defined family with simple gradient) that best matches an offset exponential decay.