Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ruipeng Dong

Simultaneous Best Subset Selection and Dimension Reduction via Primal-Dual Iterations

Dec 03, 2022

Canhong Wen, Ruipeng Dong, Xueqin Wang, Weiyu Li, Heping Zhang

Abstract:Sparse reduced rank regression is an essential statistical learning method. In the contemporary literature, estimation is typically formulated as a nonconvex optimization that often yields to a local optimum in numerical computation. Yet, their theoretical analysis is always centered on the global optimum, resulting in a discrepancy between the statistical guarantee and the numerical computation. In this research, we offer a new algorithm to address the problem and establish an almost optimal rate for the algorithmic solution. We also demonstrate that the algorithm achieves the estimation with a polynomial number of iterations. In addition, we present a generalized information criterion to simultaneously ensure the consistency of support set recovery and rank estimation. Under the proposed criterion, we show that our algorithm can achieve the oracle reduced rank estimation with a significant probability. The numerical studies and an application in the ovarian cancer genetic data demonstrate the effectiveness and scalability of our approach.

* 38 pages, 5 figures

Via

Access Paper or Ask Questions

Parallel integrative learning for large-scale multi-response regression with incomplete outcomes

Apr 11, 2021

Ruipeng Dong, Daoji Li, Zemin Zheng

Figure 1 for Parallel integrative learning for large-scale multi-response regression with incomplete outcomes

Figure 2 for Parallel integrative learning for large-scale multi-response regression with incomplete outcomes

Figure 3 for Parallel integrative learning for large-scale multi-response regression with incomplete outcomes

Figure 4 for Parallel integrative learning for large-scale multi-response regression with incomplete outcomes

Abstract:Multi-task learning is increasingly used to investigate the association structure between multiple responses and a single set of predictor variables in many applications. In the era of big data, the coexistence of incomplete outcomes, large number of responses, and high dimensionality in predictors poses unprecedented challenges in estimation, prediction, and computation. In this paper, we propose a scalable and computationally efficient procedure, called PEER, for large-scale multi-response regression with incomplete outcomes, where both the numbers of responses and predictors can be high-dimensional. Motivated by sparse factor regression, we convert the multi-response regression into a set of univariate-response regressions, which can be efficiently implemented in parallel. Under some mild regularity conditions, we show that PEER enjoys nice sampling properties including consistency in estimation, prediction, and variable selection. Extensive simulation studies show that our proposal compares favorably with several existing methods in estimation accuracy, variable selection, and computation efficiency.

* Computational Statistics and Data Analysis, 2021
* 32 pages

Via

Access Paper or Ask Questions

Statistically Guided Divide-and-Conquer for Sparse Factorization of Large Matrix

Mar 17, 2020

Kun Chen, Ruipeng Dong, Wanwan Xu, Zemin Zheng

Figure 1 for Statistically Guided Divide-and-Conquer for Sparse Factorization of Large Matrix

Figure 2 for Statistically Guided Divide-and-Conquer for Sparse Factorization of Large Matrix

Figure 3 for Statistically Guided Divide-and-Conquer for Sparse Factorization of Large Matrix

Figure 4 for Statistically Guided Divide-and-Conquer for Sparse Factorization of Large Matrix

Abstract:The sparse factorization of a large matrix is fundamental in modern statistical learning. In particular, the sparse singular value decomposition and its variants have been utilized in multivariate regression, factor analysis, biclustering, vector time series modeling, among others. The appeal of this factorization is owing to its power in discovering a highly-interpretable latent association network, either between samples and variables or between responses and predictors. However, many existing methods are either ad hoc without a general performance guarantee, or are computationally intensive, rendering them unsuitable for large-scale studies. We formulate the statistical problem as a sparse factor regression and tackle it with a divide-and-conquer approach. In the first stage of division, we consider both sequential and parallel approaches for simplifying the task into a set of co-sparse unit-rank estimation (CURE) problems, and establish the statistical underpinnings of these commonly-adopted and yet poorly understood deflation methods. In the second stage of division, we innovate a contended stagewise learning technique, consisting of a sequence of simple incremental updates, to efficiently trace out the whole solution paths of CURE. Our algorithm has a much lower computational complexity than alternating convex search, and the choice of the step size enables a flexible and principled tradeoff between statistical accuracy and computational efficiency. Our work is among the first to enable stagewise learning for non-convex problems, and the idea can be applicable in many multi-convex problems. Extensive simulation studies and an application in genetics demonstrate the effectiveness and scalability of our approach.

Via

Access Paper or Ask Questions