Abstract:Deep neural networks (DNNs) typically involve a large number of parameters and are trained to achieve zero or near-zero training error. Despite such interpolation, they often exhibit strong generalization performance on unseen data, a phenomenon that has motivated extensive theoretical investigations. Comforting results show that interpolation indeed may not affect the minimax rate of convergence under the squared error loss. In the mean time, DNNs are well known to be highly vulnerable to adversarial perturbations in future inputs. A natural question then arises: Can interpolation also escape from suboptimal performance under a future $X$-attack? In this paper, we investigate the adversarial robustness of interpolating estimators in a framework of nonparametric regression. A finding is that interpolating estimators must be suboptimal even under a subtle future $X$-attack, and achieving perfect fitting can substantially damage their robustness. An interesting phenomenon in the high interpolation regime, which we term the curse of simple size, is also revealed and discussed. Numerical experiments support our theoretical findings.
Abstract:Model averaging (MA) and ensembling play a crucial role in statistical and machine learning practice. When multiple candidate models are considered, MA techniques can be used to weight and combine them, often resulting in improved predictive accuracy and better estimation stability compared to model selection (MS) methods. In this paper, we address two challenges in combining least squares estimators from both theoretical and practical perspectives. We first establish several oracle inequalities for least squares MA via minimizing a Mallows' $C_p$ criterion under an arbitrary candidate model set. Compared to existing studies, these oracle inequalities yield faster excess risk and directly imply the asymptotic optimality of the resulting MA estimators under milder conditions. Moreover, we consider candidate model construction and investigate the problem of optimal all-subset combination for least squares estimators, which is an important yet rarely discussed topic in the existing literature. We show that there exists a fundamental limit to achieving the optimal all-subset MA risk. To attain this limit, we propose a novel Mallows-type MA procedure based on a dimension-adaptive $C_p$ criterion. The implicit ensembling effects of several MS procedures are also revealed and discussed. We conduct several numerical experiments to support our theoretical findings and demonstrate the effectiveness of the proposed Mallows-type MA estimator.
Abstract:Recent research shows the susceptibility of machine learning models to adversarial attacks, wherein minor but maliciously chosen perturbations of the input can significantly degrade model performance. In this paper, we theoretically analyse the limits of robustness against such adversarial attacks in a nonparametric regression setting, by examining the minimax rates of convergence in an adversarial sup-norm. Our work reveals that the minimax rate under adversarial attacks in the input is the same as sum of two terms: one represents the minimax rate in the standard setting without adversarial attacks, and the other reflects the maximum deviation of the true regression function value within the target function class when subjected to the input perturbations. The optimal rates under the adversarial setup can be achieved by a plug-in procedure constructed from a minimax optimal estimator in the corresponding standard setting. Two specific examples are given to illustrate the established minimax results.