Latest PyPI version License: MIT Documentation Status Downloads CodeFactor Twitter


About TopOMetry

TopOMetry is a high-level python library to explore data topology through manifold learning. It is compatible with scikit-learn, meaning most of its operators can be easily pipelined.

Its main idea is to approximate the Laplace-Beltrami Operator (LBO). This is done by learning properly weighted similarity graphs and their Laplacian and Diffusion operators. By definition, the eigenfunctions of these operators describe all underlying data topology in an orthonormal eigenbasis. These eigenbases are special versions of Diffusion Maps, Laplacian Eigenmaps or Kernel Eigenmaps. New topological operators are then learned from such eigenbasis and can be used for clustering and graph-layout optimization (visualization).

For more information, please see our pre-print.

TopOMetry is designed to handle large-scale data matrices containing extreme sample diversity, such as those generated from single-cell omics. It includes wrappers to deal with AnnData objects using scanpy.


Tutorials

If you haven’t already, install topometry and start using it (quick-start).

All of the tutorials are also freely available as Jupyter Notebooks in this separate repository.


When it is best to use TopOMetry

This is a frequently asked question with a simple answer: always, unless the data tells you so.

One should never assume a priori that a method will be the best with every single dataset. TopOMetry allows users to explore high-dimensional data in several ways and to compute many different representations for it. It is not claimed to be superior to every other method a priori. Instead, it allows practitioners to see which method works best for their particular use case by themselves, based on quantitative and qualitative assessments.

In all tested cases so far, TopOMetry models outperformed the classical PCA-based approach used in single-cell genomics, and in most of them it has also outperformed stand-alone UMAP. However, that should never be considered true without looking at the data and how these models perform on it. The aim here is to provide users with plenty of options and allow them to pick which is the best for them, empowering them with evidence-based decisions.

When not to use TopOMetry

First and foremost, when you do not have enough samples to safely assume that the manifold hypothesis holds true. In addition, one should consider that TopOMetry does not currently support neither including new data without recomputing decompositions, nor inverse transforms. If that is critical to your production workflow, then TopOMetry is problably not be the best option, and you might prefer to use UMAP or topological autoencoers. However, even in that case, you should consider TopOMetry in some of your data to evaluate whether your current workflow is generating reliable embeddings, or to estimate its intrinsic dimensionalities in order to properly build an adequate architecture.


TopOMetry classes

TopOMetry is centered around four classes of scikit-learn-like transformers:

  • Kernel - learns similarities and builds topological operators that approximate the LBO.

  • EigenDecomposition - obtains and post-processes eigenfunctions.

  • Projector - handles graph-layout optimization methods.

  • TopOGraph - orchestrates analysis by employing the above estimators and others.

The following diagram represent how TopOMetry uses these transformers to learn topological operators, their eigenfunctions as orthonormal eigenbases, topological operators of these eigenfunctions and graph projections of these graphs or eigenbases:

TopOMetry in a glance


Citation

@article {Sidarta-Oliveira2022.03.14.484134,
	author = {Sidarta-Oliveira, Davi and Velloso, Licio A},
	title = {A comprehensive dimensional reduction framework to learn single-cell phenotypic topology uncovers T cell diversity},
	elocation-id = {2022.03.14.484134},
	year = {2022},
	doi = {10.1101/2022.03.14.484134},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2022/03/17/2022.03.14.484134},
	eprint = {https://www.biorxiv.org/content/early/2022/03/17/2022.03.14.484134.full.pdf},
	journal = {bioRxiv}
}