Abstract:We provide an alternate unified framework for conformal prediction, which is a framework to provide assumption-free prediction intervals. Instead of beginning by choosing a conformity score, our framework starts with a sequence of nested sets $\{\mathcal{F}_t(x)\}_{t\in\mathcal{T}}$ for some ordered set $\mathcal{T}$ that specifies all potential prediction sets. We show that most proposed conformity scores in the literature, including several based on quantiles, straightforwardly result in nested families. Then, we argue that what conformal prediction does is find a mapping $\alpha \mapsto t(\alpha)$, meaning that it calibrates or rescales $\mathcal{T}$ to $[0,1]$. Nestedness is a natural and intuitive requirement because the optimal prediction sets (eg: level sets of conditional densities) are also nested, but we also formally prove that nested sets are universal, meaning that any conformal prediction method can be represented in our framework. Finally, to demonstrate its utility, we show how to develop the full conformal, split conformal, cross-conformal and the recent jackknife+ methods within our nested framework, thus immediately generalizing the latter two classes of methods to new settings. Specifically, we prove the validity of the leave-one-out, $K$-fold, subsampling and bootstrap variants of the latter two methods for any nested family.
Abstract:We consider least squares estimation in a general nonparametric regression model. The rate of convergence of the least squares estimator (LSE) for the unknown regression function is well studied when the errors are sub-Gaussian. We find upper bounds on the rates of convergence of the LSE when the errors have uniformly bounded conditional variance and have only finitely many moments. We show that the interplay between the moment assumptions on the error, the metric entropy of the class of functions involved, and the "local" structure of the function class around the truth drives the rate of convergence of the LSE. We find sufficient conditions on the errors under which the rate of the LSE matches the rate of the LSE under sub-Gaussian error. Our results are finite sample and allow for heteroscedastic and heavy-tailed errors.