Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Manik Kuchroo

Yale University

Manifold Interpolating Optimal-Transport Flows for Trajectory Inference

Jun 29, 2022

Guillaume Huguet, D. S. Magruder, Oluwadamilola Fasina, Alexander Tong, Manik Kuchroo, Guy Wolf, Smita Krishnaswamy

Figure 1 for Manifold Interpolating Optimal-Transport Flows for Trajectory Inference

Figure 2 for Manifold Interpolating Optimal-Transport Flows for Trajectory Inference

Figure 3 for Manifold Interpolating Optimal-Transport Flows for Trajectory Inference

Figure 4 for Manifold Interpolating Optimal-Transport Flows for Trajectory Inference

Abstract:Here, we present a method called Manifold Interpolating Optimal-Transport Flow (MIOFlow) that learns stochastic, continuous population dynamics from static snapshot samples taken at sporadic timepoints. MIOFlow combines dynamic models, manifold learning, and optimal transport by training neural ordinary differential equations (Neural ODE) to interpolate between static population snapshots as penalized by optimal transport with manifold ground distance. Further, we ensure that the flow follows the geometry by operating in the latent space of an autoencoder that we call a geodesic autoencoder (GAE). In GAE the latent space distance between points is regularized to match a novel multiscale geodesic distance on the data manifold that we define. We show that this method is superior to normalizing flows, Schr\"odinger bridges and other generative models that are designed to flow from noise to data in terms of interpolating between populations. Theoretically, we link these trajectories with dynamic optimal transport. We evaluate our method on simulated data with bifurcations and merges, as well as scRNA-seq data from embryoid body differentiation, and acute myeloid leukemia treatment.

* 19 pages, 4 tables, 13 figures

Via

Access Paper or Ask Questions

Time-inhomogeneous diffusion geometry and topology

Mar 28, 2022

Guillaume Huguet, Alexander Tong, Bastian Rieck, Jessie Huang, Manik Kuchroo, Matthew Hirn, Guy Wolf, Smita Krishnaswamy

Figure 1 for Time-inhomogeneous diffusion geometry and topology

Figure 2 for Time-inhomogeneous diffusion geometry and topology

Figure 3 for Time-inhomogeneous diffusion geometry and topology

Figure 4 for Time-inhomogeneous diffusion geometry and topology

Abstract:Diffusion condensation is a dynamic process that yields a sequence of multiscale data representations that aim to encode meaningful abstractions. It has proven effective for manifold learning, denoising, clustering, and visualization of high-dimensional data. Diffusion condensation is constructed as a time-inhomogeneous process where each step first computes and then applies a diffusion operator to the data. We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives. From a geometric perspective, we obtain convergence bounds based on the smallest transition probability and the radius of the data, whereas from a spectral perspective, our bounds are based on the eigenspectrum of the diffusion kernel. Our spectral results are of particular interest since most of the literature on data diffusion is focused on homogeneous processes. From a topological perspective, we show diffusion condensation generalizes centroid-based hierarchical clustering. We use this perspective to obtain a bound based on the number of data points, independent of their location. To understand the evolution of the data geometry beyond convergence, we use topological data analysis. We show that the condensation process itself defines an intrinsic diffusion homology. We use this intrinsic topology as well as an ambient topology to study how the data changes over diffusion time. We demonstrate both homologies in well-understood toy examples. Our work gives theoretical insights into the convergence of diffusion condensation, and shows that it provides a link between topological and geometric data analysis.

* 32 pages, 8 Figures

Via

Access Paper or Ask Questions

Embedding Signals on Knowledge Graphs with Unbalanced Diffusion Earth Mover's Distance

Jul 26, 2021

Alexander Tong, Guillaume Huguet, Dennis Shung, Amine Natik, Manik Kuchroo, Guillaume Lajoie, Guy Wolf, Smita Krishnaswamy

Figure 1 for Embedding Signals on Knowledge Graphs with Unbalanced Diffusion Earth Mover's Distance

Figure 2 for Embedding Signals on Knowledge Graphs with Unbalanced Diffusion Earth Mover's Distance

Figure 3 for Embedding Signals on Knowledge Graphs with Unbalanced Diffusion Earth Mover's Distance

Figure 4 for Embedding Signals on Knowledge Graphs with Unbalanced Diffusion Earth Mover's Distance

Abstract:In modern relational machine learning it is common to encounter large graphs that arise via interactions or similarities between observations in many domains. Further, in many cases the target entities for analysis are actually signals on such graphs. We propose to compare and organize such datasets of graph signals by using an earth mover's distance (EMD) with a geodesic cost over the underlying graph. Typically, EMD is computed by optimizing over the cost of transporting one probability distribution to another over an underlying metric space. However, this is inefficient when computing the EMD between many signals. Here, we propose an unbalanced graph earth mover's distance that efficiently embeds the unbalanced EMD on an underlying graph into an $L^1$ space, whose metric we call unbalanced diffusion earth mover's distance (UDEMD). This leads us to an efficient nearest neighbors kernel over many signals defined on a large graph. Next, we show how this gives distances between graph signals that are robust to noise. Finally, we apply this to organizing patients based on clinical notes who are modelled as signals on the SNOMED-CT medical knowledge graph, embedding lymphoblast cells modeled as signals on a gene graph, and organizing genes modeled as signals over a large peripheral blood mononuclear (PBMC) cell graph. In each case, we show that UDEMD-based embeddings find accurate distances that are highly efficient compared to other methods.

* 17 pages, 7 figures, 2 tables

Via

Access Paper or Ask Questions

Diffusion Earth Mover's Distance and Distribution Embeddings

Feb 25, 2021

Alexander Tong, Guillaume Huguet, Amine Natik, Kincaid MacDonald, Manik Kuchroo, Ronald Coifman, Guy Wolf, Smita Krishnaswamy

Figure 1 for Diffusion Earth Mover's Distance and Distribution Embeddings

Figure 2 for Diffusion Earth Mover's Distance and Distribution Embeddings

Figure 3 for Diffusion Earth Mover's Distance and Distribution Embeddings

Figure 4 for Diffusion Earth Mover's Distance and Distribution Embeddings

Abstract:We propose a new fast method of measuring distances between large numbers of related high dimensional datasets called the Diffusion Earth Mover's Distance (EMD). We model the datasets as distributions supported on common data graph that is derived from the affinity matrix computed on the combined data. In such cases where the graph is a discretization of an underlying Riemannian closed manifold, we prove that Diffusion EMD is topologically equivalent to the standard EMD with a geodesic ground distance. Diffusion EMD can be computed in $\tilde{O}(n)$ time and is more accurate than similarly fast algorithms such as tree-based EMDs. We also show Diffusion EMD is fully differentiable, making it amenable to future uses in gradient-descent frameworks such as deep neural networks. Finally, we demonstrate an application of Diffusion EMD to single cell data collected from 210 COVID-19 patient samples at Yale New Haven Hospital. Here, Diffusion EMD can derive distances between patients on the manifold of cells at least two orders of magnitude faster than equally accurate methods. This distance matrix between patients can be embedded into a higher level patient manifold which uncovers structure and heterogeneity in patients. More generally, Diffusion EMD is applicable to all datasets that are massively collected in parallel in many medical and biological systems.

* 12 pages, 6 figures, 11 page supplement

Via

Access Paper or Ask Questions

Multimodal data visualization, denoising and clustering with integrated diffusion

Feb 12, 2021

Manik Kuchroo, Abhinav Godavarthi, Guy Wolf, Smita Krishnaswamy

Figure 1 for Multimodal data visualization, denoising and clustering with integrated diffusion

Figure 2 for Multimodal data visualization, denoising and clustering with integrated diffusion

Figure 3 for Multimodal data visualization, denoising and clustering with integrated diffusion

Figure 4 for Multimodal data visualization, denoising and clustering with integrated diffusion

Abstract:We propose a method called integrated diffusion for combining multimodal datasets, or data gathered via several different measurements on the same system, to create a joint data diffusion operator. As real world data suffers from both local and global noise, we introduce mechanisms to optimally calculate a diffusion operator that reflects the combined information from both modalities. We show the utility of this joint operator in data denoising, visualization and clustering, performing better than other methods to integrate and analyze multimodal data. We apply our method to multi-omic data generated from blood cells, measuring both gene expression and chromatin accessibility. Our approach better visualizes the geometry of the joint data, captures known cross-modality associations and identifies known cellular populations. More generally, integrated diffusion is broadly applicable to multimodal datasets generated in many medical and biological systems.

Via

Access Paper or Ask Questions

Coarse Graining of Data via Inhomogeneous Diffusion Condensation

Jul 10, 2019

Nathan Brugnone, Alex Gonopolskiy, Mark W. Moyle, Manik Kuchroo, David van Dijk, Kevin R. Moon, Daniel Colon-Ramos, Guy Wolf, Matthew J. Hirn, Smita Krishnaswamy

Figure 1 for Coarse Graining of Data via Inhomogeneous Diffusion Condensation

Figure 2 for Coarse Graining of Data via Inhomogeneous Diffusion Condensation

Figure 3 for Coarse Graining of Data via Inhomogeneous Diffusion Condensation

Figure 4 for Coarse Graining of Data via Inhomogeneous Diffusion Condensation

Abstract:Big data often has emergent structure that exists at multiple levels of abstraction, which are useful for characterizing complex interactions and dynamics of the observations. Here, we consider multiple levels of abstraction via a multiresolution geometry of data points at different granularities. To construct this geometry we define a time-inhomogeneous diffusion process that effectively condenses data points together to uncover nested groupings at larger and larger granularities. This inhomogeneous process creates a deep cascade of intrinsic low pass filters in the data that are applied in sequence to gradually eliminate local variability while adjusting the learned data geometry to increasingly coarser resolutions. We provide visualizations to exhibit our method as a "continuously-hierarchical" clustering with directions of eliminated variation highlighted at each step. The utility of our algorithm is demonstrated via neuronal data condensation, where the constructed multiresolution data geometry uncovers the organization, grouping, and connectivity between neurons.

* 14 pages, 7 figures

Via

Access Paper or Ask Questions