Abstract:We investigate a problem estimating coefficients of linear regression under sparsity assumption when covariates and noises are sampled from heavy tailed distributions. Additionally, we consider the situation where not only covariates and noises are sampled from heavy tailed distributions but also contaminated by outliers. Our estimators can be computed efficiently, and exhibit sharp error bounds.
Abstract:This paper presents a comprehensive exploration of the theoretical properties inherent in the Adaptive Lasso and the Transfer Lasso. The Adaptive Lasso, a well-established method, employs regularization divided by initial estimators and is characterized by asymptotic normality and variable selection consistency. In contrast, the recently proposed Transfer Lasso employs regularization subtracted by initial estimators with the demonstrated capacity to curtail non-asymptotic estimation errors. A pivotal question thus emerges: Given the distinct ways the Adaptive Lasso and the Transfer Lasso employ initial estimators, what benefits or drawbacks does this disparity confer upon each method? This paper conducts a theoretical examination of the asymptotic properties of the Transfer Lasso, thereby elucidating its differentiation from the Adaptive Lasso. Informed by the findings of this analysis, we introduce a novel method, one that amalgamates the strengths and compensates for the weaknesses of both methods. The paper concludes with validations of our theory and comparisons of the methods via simulation experiments.
Abstract:We consider outlier robust and sparse estimation of linear regression coefficients when covariates and noise are sampled, respectively, from an $\mathfrak{L}$-subGaussian distribution and a heavy-tailed distribution, and additionally, the covariates and noise are contaminated by adversarial outliers. We deal with two cases: known or unknown covariance of the covariates. Particularly, in the former case, our estimator attains nearly information theoretical optimal error bound, and our error bound is sharper than that of earlier studies dealing with similar situations. Our estimator analysis relies heavily on Generic Chaining to derive sharp error bounds.
Abstract:We propose a novel method to estimate the coefficients of linear regression when outputs and inputs are contaminated by malicious outliers. Our method consists of two-step: (i) Make appropriate weights $\left\{\hat{w}_i\right\}_{i=1}^n$ such that the weighted sample mean of regression covariates robustly estimates the population mean of the regression covariate, (ii) Process Huber regression using $\left\{\hat{w}_i\right\}_{i=1}^n$. When (a) the regression covariate is a sequence with i.i.d. random vectors drawn from sub-Gaussian distribution with unknown mean and known identity covariance and (b) the absolute moment of the random noise is finite, our method attains a faster convergence rate than Diakonikolas, Kong and Stewart (2019) and Cherapanamjeri et al. (2020). Furthermore, our result is minimax optimal up to constant factor. When (a) the regression covariate is a sequence with i.i.d. random vectors drawn from heavy tailed distribution with unknown mean and bounded kurtosis and (b) the absolute moment of the random noise is finite, our method attains a convergence rate, which is minimax optimal up to constant factor.
Abstract:We consider robust low rank matrix estimation when random noise is heavy-tailed and output is contaminated by adversarial noise. Under the clear conditions, we firstly attain a fast convergence rate for low rank matrix estimation including compressed sensing and matrix completion with convex estimators.
Abstract:We consider the problem of inferring the causal structure from observational data, especially when the structure is sparse. This type of problem is usually formulated as an inference of a directed acyclic graph (DAG) model. The linear non-Gaussian acyclic model (LiNGAM) is one of the most successful DAG models, and various estimation methods have been developed. However, existing methods are not efficient for some reasons: (i) the sparse structure is not always incorporated in causal order estimation, and (ii) the whole information of the data is not used in parameter estimation. To address {these issues}, we propose a new estimation method for a linear DAG model with non-Gaussian noises. The proposed method is based on the log-likelihood of independent component analysis (ICA) with two penalty terms related to the sparsity and the consistency condition. The proposed method enables us to estimate the causal order and the parameters simultaneously. For stable and efficient optimization, we propose some devices, such as a modified natural gradient. Numerical experiments show that the proposed method outperforms existing methods, including LiNGAM and NOTEARS.
Abstract:Machine learning algorithms typically require abundant data under a stationary environment. However, environments are nonstationary in many real-world applications. Critical issues lie in how to effectively adapt models under an ever-changing environment. We propose a method for transferring knowledge from a source domain to a target domain via $\ell_1$ regularization. We incorporate $\ell_1$ regularization of differences between source parameters and target parameters, in addition to an ordinary $\ell_1$ regularization. Hence, our method yields sparsity for both the estimates themselves and changes of the estimates. The proposed method has a tight estimation error bound under a stationary environment, and the estimate remains unchanged from the source estimate under small residuals. Moreover, the estimate is consistent with the underlying function, even when the source estimate is mistaken due to nonstationarity. Empirical results demonstrate that the proposed method effectively balances stability and plasticity.
Abstract:We consider robust estimation when outputs are adversarially contaminated. Nguyen and Tran (2012) proposed an extended Lasso for robust parameter estimation and then they showed the convergence rate of the estimation error. Recently, Dalalyan and Thompson (2019) gave some useful inequalities and then they showed a sharper convergence rate than Nguyen and Tran (2012) . They focused on the fact that the minimization problem of the extended Lasso can become that of the penalized Huber loss function with $L_1$ penalty. The distinguishing point is that the Huber loss function includes an extra tuning parameter, which is different from the conventional method. However, there is a critical mistake in the proof of Dalalyan and Thompson (2019). We improve the proof and then we give a sharper convergence rate than Nguyen and Tran (2012) , when the number of outliers is larger. The significance of our proof is to use some specific properties of the Huber function. Such techniques have not been used in the past proofs.
Abstract:Sparse regression such as Lasso has achieved great success in dealing with high dimensional data for several decades. However, there are few methods applicable to missing data, which often occurs in high dimensional data. Recently, CoCoLasso was proposed to deal with high dimensional missing data, but it still suffers from highly missing data. In this paper, we propose a novel Lasso-type regression technique for Highly Missing data, called `HMLasso'. We use the mean imputed covariance matrix, which is notorious in general due to its estimation bias for missing data. However, we effectively incorporate it into Lasso, by using a useful connection with the pairwise covariance matrix. The resulting optimization problem can be seen as a weighted modification of CoCoLasso with the missing ratios, and is quite effective for highly missing data. To the best of our knowledge, this is the first method that can efficiently deal with both high dimensional and highly missing data. We show that the proposed method is beneficial with regards to non-asymptotic properties of the covariance matrix. Numerical experiments show that the proposed method is highly advantageous in terms of estimation error and generalization error.
Abstract:The stochastic gradient descent has been widely used for solving composite optimization problems in big data analyses. Many algorithms and convergence properties have been developed. The composite functions were convex primarily and gradually nonconvex composite functions have been adopted to obtain more desirable properties. The convergence properties have been investigated, but only when either of composite functions is nonconvex. There is no convergence property when both composite functions are nonconvex, which is named the \textit{doubly-nonconvex} case.To overcome this difficulty, we assume a simple and weak condition that the penalty function is \textit{quasiconvex} and then we obtain convergence properties for the stochastic doubly-nonconvex composite optimization problem.The convergence rate obtained here is of the same order as the existing work.We deeply analyze the convergence rate with the constant step size and mini-batch size and give the optimal convergence rate with appropriate sizes, which is superior to the existing work. Experimental results illustrate that our method is superior to existing methods.