Installation and dependencies
TopOMetry is implemented in python and and its models are implemented as classes that inherit from scikit-learn BaseEstimator
and TransformerMixin
. This makes these classes compatible with scikit-learn
Pipelines and thus flexible and easy to apply and/or combine with different workflows on virtually any domain.
The hard dependencies are common building-blocks of the python machine-learning environment:
numpy
scipy
pandas
numba
scikit-learn
matplotlib
Prior to installing TopOMetry, make sure you have cmake, scikit-build and setuptools available in your system. If using Linux:
sudo apt-get install cmake
pip install scikit-build setuptools
Then you can install TopOMetry from PyPI:
pip install topometry
NOTE: if your version of python is beyond 3.9, you may want to avoid an existing issue with setuptools
by setting the flag --use-PEP517
:
pip install --use-pep517 topometry
Optional dependencies
Some optional packages can enhance the use of TopOMetry, but are not listed as hard-dependencies. These are libraries for approximate-nearest-neighbors search and libraries for graph-layout optimization.
Approximate Nearest Neighbors
Included in TopOMetry there is topo.ann.kNN()
- an utility wrapper around these methods that can learn k-nearest-neighbors graphs from data using various approximate nearest-neighbors search methods. The reason I tried to make it so flexible was to allow it to be efficiently used in multiple computational settings/environments.
The optional libraries for approximate-nearest-neighbors are:
If your CPU supports advanced instructions, I recommend you install nmslib separately for the best performance:
pip install --no-binary :all: nmslib
If you don’t have any of these installed, TopOMetry will run using scikit-learn
neighborhood search, which can be quite slow when analysing large datasets. NMSLib and HNSWlib are my primary recommendations for large-scale data, but other methods also work well.
Additional layout methods
From version 2.0.0
onwards, TopOMetry does not include any graph layout algorithm as a dependency, and includes fast versions of Isomap and of the cross-entropy minimization of UMAP (MAP) for graph layout and visualization. Other layout algorithms can be used, but are not listed as hard-dependencies and the choice of installing and using them is left to the user:
‘t-SNE’ - one of the first manifold learning methods (optionally with
multicore-tsne
, otherwise usesscikit-learn
)‘UMAP’ - arguably the state-of-the-art for graph layout optimization (requires installing
umap-learn
)‘PaCMAP’ (Pairwise-controlled Manifold Approximation and Projection) - for balanced visualizations (requires installing
pacmap
)‘TriMAP’ - dimensionality reduction using triplets (requires installing
trimap
)‘IsomorphicMDE’ - MDE with preservation of nearest neighbors (requires installing
pymde
)‘IsometricMDE’ - MDE with preservation of pairwise distances (requires installing
pymde
)‘NCVis’ (Noise Contrastive Visualization) - a UMAP-like method with blazing fast performance (requires installing
ncvis
)
If you want to use them all, install them with:
pip install multicore-tsne umap-learn pacmap trimap pymde ncvis
These projection methods are handled by the topo.layout.Projector()
class, and are quite straightforward to be added into the framework, so please open an Issue if your favorite method is not listed.
Please open a note in the Issue tracker if you have any trouble with installation!