Abstract:We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $\epsilon n$ arbitrary outliers. We wish to estimate a $p$-dimensional parameter $b^*$ given such sample of a label-feature pair $(y,x)$ satisfying $y=\langle x,b^*\rangle+\xi$ with heavy-tailed $(x,\xi)$. We only assume $x$ is $L^4-L^2$ hypercontractive with constant $L>0$ and has covariance matrix $\Sigma$ with minimum eigenvalue $1/\mu^2>0$ and bounded condition number $\kappa>0$. The noise $\xi$ can be arbitrarily dependent on $x$ and nonsymmetric as long as $\xi x$ has finite covariance matrix $\Xi$. We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(\Sigma,\Xi)$ nor the operator norm of $\Xi$. With probability at least $1-\delta$, our proposed estimator attains the statistical rate $\mu^2\Vert\Xi\Vert^{1/2}(\frac{p}{n}+\frac{\log(1/\delta)}{n}+\epsilon)^{1/2}$ and breakdown-point $\epsilon\lesssim\frac{1}{L^4\kappa^2}$, both optimal in the $\ell_2$-norm, assuming the near-optimal minimum sample size $L^4\kappa^2(p\log p + \log(1/\delta))\lesssim n$, up to a log factor. To the best of our knowledge, this is the first computationally tractable algorithm satisfying simultaneously all the mentioned properties. Our estimator is based on a two-stage Multiplicative Weight Update algorithm. The first stage estimates a descent direction $\hat v$ with respect to the (unknown) pre-conditioned inner product $\langle\Sigma(\cdot),\cdot\rangle$. The second stage estimate the descent direction $\Sigma\hat v$ with respect to the (known) inner product $\langle\cdot,\cdot\rangle$, without knowing nor estimating $\Sigma$.
Abstract:We consider high-dimensional least-squares regression when a fraction $\epsilon$ of the labels are contaminated by an arbitrary adversary. We analyze such problem in the statistical learning framework with a subgaussian distribution and linear hypothesis class on the space of $d_1\times d_2$ matrices. As such, we allow the noise to be heterogeneous. This framework includes sparse linear regression and low-rank trace-regression. For a $p$-dimensional $s$-sparse parameter, we show that a convex regularized $M$-estimator using a sorted Huber-type loss achieves the near-optimal subgaussian rate $$ \sqrt{s\log(ep/s)}+\sqrt{\log(1/\delta)/n}+\epsilon\log(1/\epsilon), $$ with probability at least $1-\delta$. For a $(d_1\times d_2)$-dimensional parameter with rank $r$, a nuclear-norm regularized $M$-estimator using the same sorted Huber-type loss achieves the subgaussian rate $$ \sqrt{rd_1/n}+\sqrt{rd_2/n}+\sqrt{\log(1/\delta)/n}+\epsilon\log(1/\epsilon), $$ again optimal up to a log factor. In a second part, we study the trace-regression problem when the parameter is the sum of a matrix with rank $r$ plus a $s$-sparse matrix assuming the "low-spikeness" condition. Unlike multivariate regression studied in previous work, the design in trace-regression lacks positive-definiteness in high-dimensions. Still, we show that a regularized least-squares estimator achieves the subgaussian rate $$ \sqrt{rd_1/n}+\sqrt{rd_2/n}+\sqrt{s\log(d_1d_2)/n} +\sqrt{\log(1/\delta)/n}. $$ Lastly, we consider noisy matrix completion with non-uniform sampling when a fraction $\epsilon$ of the sampled low-rank matrix is corrupted by outliers. If only the low-rank matrix is of interest, we show that a nuclear-norm regularized Huber-type estimator achieves, up to log factors, the optimal rate adaptively to the corruption level. The above mentioned rates require no information on $(s,r,\epsilon)$.
Abstract:We study the problem of estimating a $p$-dimensional $s$-sparse vector in a linear model with Gaussian design and additive noise. In the case where the labels are contaminated by at most $o$ adversarial outliers, we prove that the $\ell_1$-penalized Huber's $M$-estimator based on $n$ samples attains the optimal rate of convergence $(s/n)^{1/2} + (o/n)$, up to a logarithmic factor. For more general design matrices, our results highlight the importance of two properties: the transfer principle and the incoherence property. These properties with suitable constants are shown to yield the optimal rates, up to log-factors, of robust estimation with adversarial contamination.