Abstract:This paper addresses the problem of assessing the variability of predictions from deep neural networks. There is a growing literature on using and improving the predictive accuracy of deep networks, but a concomitant improvement in the quantification of their uncertainty is lacking. We provide a prediction interval network (PI-Network) which is a transparent, tractable modification of the standard predictive loss used to train deep networks. The PI-Network outputs three values instead of a single point estimate and optimizes a loss function inspired by quantile regression. We go beyond merely motivating the construction of these networks and provide two prediction interval methods with provable, finite sample coverage guarantees without any assumptions on the underlying distribution from which our data is drawn. We only require that the observations are independent and identically distributed. Furthermore, our intervals adapt to heteroskedasticity and asymmetry in the conditional distribution of the response given the covariates. The first method leverages the conformal inference framework and provides average coverage. The second method provides a new, stronger guarantee by conditioning on the observed data. Lastly, our loss function does not compromise the predictive accuracy of the network like other prediction interval methods. We demonstrate the ease of use of the PI-Network as well as its improvements over other methods on both simulated and real data. As the PI-Network can be used with a host of deep learning methods with only minor modifications, its use should become standard practice, much like reporting standard errors along with mean estimates.
Abstract:We study the distributions of the LASSO, SCAD, and thresholding estimators, in finite samples and in the large-sample limit. The asymptotic distributions are derived for both the case where the estimators are tuned to perform consistent model selection and for the case where the estimators are tuned to perform conservative model selection. Our findings complement those of Knight and Fu (2000) and Fan and Li (2001). We show that the distributions are typically highly nonnormal regardless of how the estimator is tuned, and that this property persists in large samples. The uniform convergence rate of these estimators is also obtained, and is shown to be slower than 1/root(n) in case the estimator is tuned to perform consistent model selection. An impossibility result regarding estimation of the estimators' distribution function is also provided.