Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuta Hozumi

Analyzing Single Cell RNA Sequencing with Topological Nonnegative Matrix Factorization

Oct 24, 2023

Yuta Hozumi, Guo-Wei Wei

Abstract:Single-cell RNA sequencing (scRNA-seq) is a relatively new technology that has stimulated enormous interest in statistics, data science, and computational biology due to the high dimensionality, complexity, and large scale associated with scRNA-seq data. Nonnegative matrix factorization (NMF) offers a unique approach due to its meta-gene interpretation of resulting low-dimensional components. However, NMF approaches suffer from the lack of multiscale analysis. This work introduces two persistent Laplacian regularized NMF methods, namely, topological NMF (TNMF) and robust topological NMF (rTNMF). By employing a total of 12 datasets, we demonstrate that the proposed TNMF and rTNMF significantly outperform all other NMF-based methods. We have also utilized TNMF and rTNMF for the visualization of popular Uniform Manifold Approximation and Projection (UMAP) and t-distributed stochastic neighbor embedding (t-SNE).

Via

Access Paper or Ask Questions

K-Nearest-Neighbors Induced Topological PCA for scRNA Sequence Data Analysis

Oct 23, 2023

Sean Cottrell, Yuta Hozumi, Guo-Wei Wei

Figure 1 for K-Nearest-Neighbors Induced Topological PCA for scRNA Sequence Data Analysis

Figure 2 for K-Nearest-Neighbors Induced Topological PCA for scRNA Sequence Data Analysis

Figure 3 for K-Nearest-Neighbors Induced Topological PCA for scRNA Sequence Data Analysis

Figure 4 for K-Nearest-Neighbors Induced Topological PCA for scRNA Sequence Data Analysis

Abstract:Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell-cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing downstream analysis. Traditional PCA, a main workhorse in dimensionality reduction, lacks the ability to capture geometrical structure information embedded in the data, and previous graph Laplacian regularizations are limited by the analysis of only a single scale. We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L$_{2,1}$ norm regularization to address multiscale and multiclass heterogeneity issues in data. We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method. The proposed kNN-PL is a new algebraic topology technique which addresses the many limitations of the traditional persistent homology. Rather than inducing filtration via the varying of a distance threshold, we introduced kNN-tPCA, where filtrations are achieved by varying the number of neighbors in a kNN network at each step, and find that this framework has significant implications for hyper-parameter tuning. We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse benchmark scRNA-seq datasets, and showcase that our methods outperform other unsupervised PCA enhancements from the literature, as well as popular Uniform Manifold Approximation (UMAP), t-Distributed Stochastic Neighbor Embedding (tSNE), and Projection Non-Negative Matrix Factorization (NMF) by significant margins.

* 28 pages, 11 figures

Via

Access Paper or Ask Questions

Analyzing scRNA-seq data by CCP-assisted UMAP and t-SNE

Jun 23, 2023

Yuta Hozumi, Gu-Wei Wei

Via

Access Paper or Ask Questions

CCP: Correlated Clustering and Projection for Dimensionality Reduction

Jun 08, 2022

Yuta Hozumi, Rui Wang, Guo-Wei Wei

Figure 1 for CCP: Correlated Clustering and Projection for Dimensionality Reduction

Figure 2 for CCP: Correlated Clustering and Projection for Dimensionality Reduction

Figure 3 for CCP: Correlated Clustering and Projection for Dimensionality Reduction

Figure 4 for CCP: Correlated Clustering and Projection for Dimensionality Reduction

Abstract:Most dimensionality reduction methods employ frequency domain representations obtained from matrix diagonalization and may not be efficient for large datasets with relatively high intrinsic dimensions. To address this challenge, Correlated Clustering and Projection (CCP) offers a novel data domain strategy that does not need to solve any matrix. CCP partitions high-dimensional features into correlated clusters and then projects correlated features in each cluster into a one-dimensional representation based on sample correlations. Residue-Similarity (R-S) scores and indexes, the shape of data in Riemannian manifolds, and algebraic topology-based persistent Laplacian are introduced for visualization and analysis. Proposed methods are validated with benchmark datasets associated with various machine learning algorithms.

Via

Access Paper or Ask Questions