Abstract:This paper presents a novel approach to ensemble prediction called "Covariate-dependent Stacking" (CDST). Unlike traditional stacking methods, CDST allows model weights to vary flexibly as a function of covariates, thereby enhancing predictive performance in complex scenarios. We formulate the covariate-dependent weights through combinations of basis functions, estimate them by optimizing cross-validation, and develop an Expectation-Maximization algorithm, ensuring computational efficiency. To analyze the theoretical properties, we establish an oracle inequality regarding the expected loss to be minimized for estimating model weights. Through comprehensive simulation studies and an application to large-scale land price prediction, we demonstrate that CDST consistently outperforms conventional model averaging methods, particularly on datasets where some models fail to capture the underlying complexity. Our findings suggest that CDST is especially valuable for, but not limited to, spatio-temporal prediction problems, offering a powerful tool for researchers and practitioners in various fields of data analysis.
Abstract:The remarkable generalization performance of overparameterized models has challenged the conventional wisdom of statistical learning theory. While recent theoretical studies have shed light on this behavior in linear models or nonlinear classifiers, a comprehensive understanding of overparameterization in nonlinear regression remains lacking. This paper explores the predictive properties of overparameterized nonlinear regression within the Bayesian framework, extending the methodology of adaptive prior based on the intrinsic spectral structure of the data. We establish posterior contraction for single-neuron models with Lipschitz continuous activation functions and for generalized linear models, demonstrating that our approach achieves consistent predictions in the overparameterized regime. Moreover, our Bayesian framework allows for uncertainty estimation of the predictions. The proposed method is validated through numerical simulations and a real data application, showcasing its ability to achieve accurate predictions and reliable uncertainty estimates. Our work advances the theoretical understanding of the blessing of overparameterization and offers a principled Bayesian approach for prediction in large nonlinear models.
Abstract:In high-dimensional Bayesian statistics, several methods have been developed, including many prior distributions that lead to the sparsity of estimated parameters. However, such priors have limitations in handling the spectral eigenvector structure of data, and as a result, they are ill-suited for analyzing over-parameterized models (high-dimensional linear models that do not assume sparsity) that have been developed in recent years. This paper introduces a Bayesian approach that uses a prior dependent on the eigenvectors of data covariance matrices, but does not induce the sparsity of parameters. We also provide contraction rates of derived posterior distributions and develop a truncated Gaussian approximation of the posterior distribution. The former demonstrates the efficiency of posterior estimation, while the latter enables quantification of parameter uncertainty using a Bernstein-von Mises-type approach. These results indicate that any Bayesian method that can handle the spectrum of data and estimate non-sparse high dimensions would be possible.