Abstract:Single-cell data integration can provide a comprehensive molecular view of cells, and many algorithms have been developed to remove unwanted technical or biological variations and integrate heterogeneous single-cell datasets. Despite their wide usage, existing methods suffer from several fundamental limitations. In particular, we lack a rigorous statistical test for whether two high-dimensional single-cell datasets are alignable (and therefore should even be aligned). Moreover, popular methods can substantially distort the data during alignment, making the aligned data and downstream analysis difficult to interpret. To overcome these limitations, we present a spectral manifold alignment and inference (SMAI) framework, which enables principled and interpretable alignability testing and structure-preserving integration of single-cell data. SMAI provides a statistical test to robustly determine the alignability between datasets to avoid misleading inference, and is justified by high-dimensional statistical theory. On a diverse range of real and simulated benchmark datasets, it outperforms commonly used alignment methods. Moreover, we show that SMAI improves various downstream analyses such as identification of differentially expressed genes and imputation of single-cell spatial transcriptomics, providing further biological insights. SMAI's interpretability also enables quantification and a deeper understanding of the sources of technical confounders in single-cell data.
Abstract:Dimension reduction and data visualization aim to project a high-dimensional dataset to a low-dimensional space while capturing the intrinsic structures in the data. It is an indispensable part of modern data science, and many dimensional reduction and visualization algorithms have been developed. However, different algorithms have their own strengths and weaknesses, making it critically important to evaluate their relative performance for a given dataset, and to leverage and combine their individual strengths. In this paper, we propose an efficient spectral method for assessing and combining multiple visualizations of a given dataset produced by diverse algorithms. The proposed method provides a quantitative measure -- the visualization eigenscore -- of the relative performance of the visualizations for preserving the structure around each data point. Then it leverages the eigenscores to obtain a consensus visualization, which has much improved { quality over the individual visualizations in capturing the underlying true data structure.} Our approach is flexible and works as a wrapper around any visualizations. We analyze multiple simulated and real-world datasets from diverse applications to demonstrate the effectiveness of the eigenscores for evaluating visualizations and the superiority of the proposed consensus visualization. Furthermore, we establish rigorous theoretical justification of our method based on a general statistical framework, yielding fundamental principles behind the empirical success of consensus visualization along with practical guidance.
Abstract:Deep neural network (DNN) models for computer vision are now capable of human-level object recognition. Consequently, similarities in the performance and vulnerabilities of DNN and human vision are of great interest. Here we characterize the response of the VGG-19 DNN to images of the Scintillating Grid visual illusion, in which white dots are perceived to be partially black. We observed a significant deviation from the expected monotonic relation between VGG-19 representational dissimilarity and dot whiteness in the Scintillating Grid. That is, a linear increase in dot whiteness leads to a non-linear increase and then, remarkably, a decrease (non-monotonicity) in representational dissimilarity. In control images, mostly monotonic relations between representational dissimilarity and dot whiteness were observed. Furthermore, the dot whiteness level corresponding to the maximal representational dissimilarity (i.e. onset of non-monotonic dissimilarity) matched closely with that corresponding to the onset of illusion perception in human observers. As such, the non-monotonic response in the DNN is a potential model correlate for human illusion perception.