Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?

Apr 21, 2022

Kaiqi Zhang, Yu-Xiang Wang

Figure 1 for Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?

Figure 2 for Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?

Figure 3 for Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?

Figure 4 for Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?

Share this with someone who'll enjoy it:

Abstract:We study the theory of neural network (NN) from the lens of classical nonparametric regression problems with a focus on NN's ability to adaptively estimate functions with heterogeneous smoothness -- a property of functions in Besov or Bounded Variation (BV) classes. Existing work on this problem requires tuning the NN architecture based on the function spaces and sample sizes. We consider a "Parallel NN" variant of deep ReLU networks and show that the standard weight decay is equivalent to promoting the $\ell_p$-sparsity ($0<p<1$) of the coefficient vector of an end-to-end learned function bases, i.e., a dictionary. Using this equivalence, we further establish that by tuning only the weight decay, such Parallel NN achieves an estimation error arbitrarily close to the minimax rates for both the Besov and BV classes. Notably, it gets exponentially closer to minimax optimal as the NN gets deeper. Our research sheds new lights on why depth matters and how NNs are more powerful than kernel methods.

* 40 pages, 8 figures

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?

Paper and Code