Installation and dependencies

TopOMetry is implemented in python and and its models are implemented as classes that inherit from scikit-learn BaseEstimator and TransformerMixin. This makes these classes compatible with scikit-learn Pipelines and thus flexible and easy to apply and/or combine with different workflows on virtually any domain.

The hard dependencies are common building-blocks of the python machine-learning environment:

  • numpy

  • scipy

  • pandas

  • numba

  • scikit-learn

  • matplotlib

Prior to installing TopOMetry, make sure you have cmake, scikit-build and setuptools available in your system. If using Linux:

sudo apt-get install cmake
pip install scikit-build setuptools

Then you can install TopOMetry from PyPI:

pip install topometry

NOTE: if your version of python is beyond 3.9, you may want to avoid an existing issue with setuptools by setting the flag --use-PEP517:

pip install --use-pep517 topometry

Optional dependencies

Some optional packages can enhance the use of TopOMetry, but are not listed as hard-dependencies. These are libraries for approximate-nearest-neighbors search and libraries for graph-layout optimization.

Approximate Nearest Neighbors

Included in TopOMetry there is topo.ann.kNN() - an utility wrapper around these methods that can learn k-nearest-neighbors graphs from data using various approximate nearest-neighbors search methods. The reason I tried to make it so flexible was to allow it to be efficiently used in multiple computational settings/environments.

The optional libraries for approximate-nearest-neighbors are:

If your CPU supports advanced instructions, I recommend you install nmslib separately for the best performance:

pip install --no-binary :all: nmslib

If you don’t have any of these installed, TopOMetry will run using scikit-learn neighborhood search, which can be quite slow when analysing large datasets. NMSLib and HNSWlib are my primary recommendations for large-scale data, but other methods also work well.

Additional layout methods

From version 2.0.0 onwards, TopOMetry does not include any graph layout algorithm as a dependency, and includes fast versions of Isomap and of the cross-entropy minimization of UMAP (MAP) for graph layout and visualization. Other layout algorithms can be used, but are not listed as hard-dependencies and the choice of installing and using them is left to the user:

  • ‘t-SNE’ - one of the first manifold learning methods (optionally with multicore-tsne, otherwise uses scikit-learn)

  • ‘UMAP’ - arguably the state-of-the-art for graph layout optimization (requires installing umap-learn)

  • ‘PaCMAP’ (Pairwise-controlled Manifold Approximation and Projection) - for balanced visualizations (requires installing pacmap)

  • ‘TriMAP’ - dimensionality reduction using triplets (requires installing trimap)

  • ‘IsomorphicMDE’ - MDE with preservation of nearest neighbors (requires installing pymde)

  • ‘IsometricMDE’ - MDE with preservation of pairwise distances (requires installing pymde)

  • ‘NCVis’ (Noise Contrastive Visualization) - a UMAP-like method with blazing fast performance (requires installing ncvis)

If you want to use them all, install them with:

pip install multicore-tsne umap-learn pacmap trimap pymde ncvis

These projection methods are handled by the topo.layout.Projector() class, and are quite straightforward to be added into the framework, so please open an Issue if your favorite method is not listed.


Please open a note in the Issue tracker if you have any trouble with installation!