Abstract:Neural networks have shown remarkable success, especially in overparameterized or "large" models. Despite increasing empirical evidence and intuitive understanding, a formal mathematical justification for the behavior of such models, particularly regarding overfitting, remains incomplete. In this paper, we prove that the Mean Integrated Squared Error (MISE) of neural networks with either $L^1$ or $L^2$ penalty decreases after a certain model size threshold, provided that the sample size is sufficiently large, and achieves nearly the minimax optimality in the Barron space. These results challenge conventional statistical modeling frameworks and broadens recent findings on the double descent phenomenon in neural networks. Our theoretical results also extend to deep learning models with ReLU activation functions.
Abstract:The multivariate adaptive regression spline (MARS) is one of the popular estimation methods for nonparametric multivariate regressions. However, as MARS is based on marginal splines, to incorporate interactions of covariates, products of the marginal splines must be used, which leads to an unmanageable number of basis functions when the order of interaction is high and results in low estimation efficiency. In this paper, we improve the performance of MARS by using linear combinations of the covariates which achieve sufficient dimension reduction. The special basis functions of MARS facilitate calculation of gradients of the regression function, and estimation of the linear combinations is obtained via eigen-analysis of the outer-product of the gradients. Under some technical conditions, the asymptotic theory is established for the proposed estimation method. Numerical studies including both simulation and empirical applications show its effectiveness in dimension reduction and improvement over MARS and other commonly-used nonparametric methods in regression estimation and prediction.