Abstract:We consider the problem $(\mathrm{P})$ of fitting $n$ standard Gaussian random vectors in $\mathbb{R}^d$ to the boundary of a centered ellipsoid, as $n, d \to \infty$. This problem is conjectured to have a sharp feasibility transition: for any $\varepsilon > 0$, if $n \leq (1 - \varepsilon) d^2 / 4$ then $(\mathrm{P})$ has a solution with high probability, while $(\mathrm{P})$ has no solutions with high probability if $n \geq (1 + \varepsilon) d^2 /4$. So far, only a trivial bound $n \geq d^2 / 2$ is known on the negative side, while the best results on the positive side assume $n \leq d^2 / \mathrm{polylog}(d)$. In this work, we improve over previous approaches using a key result of Bartl & Mendelson on the concentration of Gram matrices of random vectors under mild assumptions on their tail behavior. This allows us to give a simple proof that $(\mathrm{P})$ is feasible with high probability when $n \leq d^2 / C$, for a (possibly large) constant $C > 0$.
Abstract:We consider the problem of estimating the mean of a random vector based on $N$ independent, identically distributed observations. We prove the existence of an estimator that has a near-optimal error in all directions in which the variance of the one dimensional marginal of the random vector is not too small: with probability $1-\delta$, the procedure returns $\wh{\mu}_N$ which satisfies that for every direction $u \in S^{d-1}$, \[ \inr{\wh{\mu}_N - \mu, u}\le \frac{C}{\sqrt{N}} \left( \sigma(u)\sqrt{\log(1/\delta)} + \left(\E\|X-\EXP X\|_2^2\right)^{1/2} \right)~, \] where $\sigma^2(u) = \var(\inr{X,u})$ and $C$ is a constant. To achieve this, we require only slightly more than the existence of the covariance matrix, in the form of a certain moment-equivalence assumption. The proof relies on novel bounds for the ratio of empirical and true probabilities that hold uniformly over certain classes of random variables.
Abstract:We study learning problems in which the underlying class is a bounded subset of $L_p$ and the target $Y$ belongs to $L_p$. Previously, minimax sample complexity estimates were known under such boundedness assumptions only when $p=\infty$. We present a sharp sample complexity estimate that holds for any $p > 4$. It is based on a learning procedure that is suited for heavy-tailed problems.
Abstract:We survey some of the recent advances in mean estimation and regression function estimation. In particular, we describe sub-Gaussian mean estimators for possibly heavy-tailed data both in the univariate and multivariate settings. We focus on estimators based on median-of-means techniques but other methods such as the trimmed mean and Catoni's estimator are also reviewed. We give detailed proofs for the cornerstone results. We dedicate a section on statistical learning problems--in particular, regression function estimation--in the presence of possibly heavy-tailed data.
Abstract:We explore ways in which the covariance ellipsoid ${\cal B}=\{v \in \mathbb{R}^d : \mathbb{E} <X,v>^2 \leq 1\}$ of a centred random vector $X$ in $\mathbb{R}^d$ can be approximated by a simple set. The data one is given for constructing the approximating set consists of $X_1,...,X_N$ that are independent and distributed as $X$. We present a general method that can be used to construct such approximations and implement it for two types of approximating sets. We first construct a (random) set ${\cal K}$ defined by a union of intersections of slabs $H_{z,\alpha}=\{v \in \mathbb{R}^d : |<z,v>| \leq \alpha\}$ (and therefore ${\cal K}$ is actually the output of a neural network with two hidden layers). The slabs are generated using $X_1,...,X_N$, and under minimal assumptions on $X$ (e.g., $X$ can be heavy-tailed) it suffices that $N = c_1d \eta^{-4}\log(2/\eta)$ to ensure that $(1-\eta) {\cal K} \subset {\cal B} \subset (1+\eta){\cal K}$. In some cases (e.g., if $X$ is rotation invariant and has marginals that are well behaved in some weak sense), a smaller sample size suffices: $N = c_1d\eta^{-2}\log(2/\eta)$. We then show that if the slabs are replaced by randomly generated ellipsoids defined using $X_1,...,X_N$, the same degree of approximation is true when $N \geq c_2d\eta^{-2}\log(2/\eta)$. The construction we use is based on the small-ball method.
Abstract:We study learning problems involving arbitrary classes of functions $F$, distributions $X$ and targets $Y$. Because proper learning procedures, i.e., procedures that are only allowed to select functions in $F$, tend to perform poorly unless the problem satisfies some additional structural property (e.g., that $F$ is convex), we consider unrestricted learning procedures that are free to choose functions outside the given class. We present a new unrestricted procedure that is optimal in a very strong sense: the required sample complexity is essentially the best one can hope for, and the estimate holds for (almost) any problem, including heavy-tailed situations. Moreover, the sample complexity coincides with the what one would expect if $F$ were convex, even when $F$ is not. And if $F$ is convex, the procedure turns out to be proper. Thus, the unrestricted procedure is actually optimal in both realms, for convex classes as a proper procedure and for arbitrary classes as an unrestricted procedure.
Abstract:A regularized risk minimization procedure for regression function estimation is introduced that achieves near optimal accuracy and confidence under general conditions, including heavy-tailed predictor and response variables. The procedure is based on median-of-means tournaments, introduced by the authors in [8]. It is shown that the new procedure outperforms standard regularized empirical risk minimization procedures such as lasso or slope in heavy-tailed problems.
Abstract:The small-ball method was introduced as a way of obtaining a high probability, isomorphic lower bound on the quadratic empirical process, under weak assumptions on the indexing class. The key assumption was that class members satisfy a uniform small-ball estimate, that is, $Pr(|f| \geq \kappa\|f\|_{L_2}) \geq \delta$ for given constants $\kappa$ and $\delta$. Here we extend the small-ball method and obtain a high probability, almost-isometric (rather than isomorphic) lower bound on the quadratic empirical process. The scope of the result is considerably wider than the small-ball method: there is no need for class members to satisfy a uniform small-ball condition, and moreover, motivated by the notion of tournament learning procedures, the result is stable under a `majority vote'. As applications, we study the performance of empirical risk minimization in learning problems involving bounded subsets of $L_p$ that satisfy a Bernstein condition, and of the tournament procedure in problems involving bounded subsets of $L_\infty$.
Abstract:In this note we answer a question of G. Lecu\'{e}, by showing that column normalization of a random matrix with iid entries need not lead to good sparse recovery properties, even if the generating random variable has a reasonable moment growth. Specifically, for every $2 \leq p \leq c_1\log d$ we construct a random vector $X \in R^d$ with iid, mean-zero, variance $1$ coordinates, that satisfies $\sup_{t \in S^{d-1}} \|<X,t>\|_{L_q} \leq c_2\sqrt{q}$ for every $2\leq q \leq p$. We show that if $m \leq c_3\sqrt{p}d^{1/p}$ and $\tilde{\Gamma}:R^d \to R^m$ is the column-normalized matrix generated by $m$ independent copies of $X$, then with probability at least $1-2\exp(-c_4m)$, $\tilde{\Gamma}$ does not satisfy the exact reconstruction property of order $2$.
Abstract:We study the problem of estimating the mean of a random vector $X$ given a sample of $N$ independent, identically distributed points. We introduce a new estimator that achieves a purely sub-Gaussian performance under the only condition that the second moment of $X$ exists. The estimator is based on a novel concept of a multivariate median.