Abstract:Randomized algorithms can be used to speed up the analysis of large datasets. In this paper, we develop a unified methodology for statistical inference via randomized sketching or projections in two of the most fundamental problems in multivariate statistical analysis: least squares and PCA. The methodology applies to fixed datasets -- i.e., is data-conditional -- and the only randomness is due to the randomized algorithm. We propose statistical inference methods for a broad range of sketching distributions, such as the subsampled randomized Hadamard transform (SRHT), Sparse Sign Embeddings (SSE) and CountSketch, sketching matrices with i.i.d. entries, and uniform subsampling. To our knowledge, no comparable methods are available for SSE and for SRHT in PCA. Our novel theoretical approach rests on showing the asymptotic normality of certain quadratic forms. As a contribution of broader interest, we show central limit theorems for quadratic forms of the SRHT, relying on a novel proof via a dyadic expansion that leverages the recursive structure of the Hadamard transform. Numerical experiments using both synthetic and empirical datasets support the efficacy of our methods, and in particular suggest that sketching methods can have better computation-estimation tradeoffs than recently proposed optimal subsampling methods.
Abstract:Dynamic functional connectivity networks (dFCN) based on rs-fMRI have demonstrated tremendous potential for brain function analysis and brain disease classification. Recently, studies have applied deep learning techniques (i.e., convolutional neural network, CNN) to dFCN classification, and achieved better performance than the traditional machine learning methods. Nevertheless, previous deep learning methods usually perform successive convolutional operations on the input dFCNs to obtain high-order brain network aggregation features, extracting them from each sliding window using a series split, which may neglect non-linear correlations among different regions and the sequentiality of information. Thus, important high-order sequence information of dFCNs, which could further improve the classification performance, is ignored in these studies. Nowadays, inspired by the great success of Transformer in natural language processing and computer vision, some latest work has also emerged on the application of Transformer for brain disease diagnosis based on rs-fMRI data. Although Transformer is capable of capturing non-linear correlations, it lacks accounting for capturing local spatial feature patterns and modelling the temporal dimension due to parallel computing, even equipped with a positional encoding technique. To address these issues, we propose a self-attention (SA) based convolutional recurrent network (SA-CRN) learning framework for brain disease classification with rs-fMRI data. The experimental results on a public dataset (i.e., ADNI) demonstrate the effectiveness of our proposed SA-CRN method.