Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

Jul 23, 2023

Kaiyue Wen, Zhiyuan Li, Tengyu Ma

Figure 1 for Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

Figure 2 for Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

Figure 3 for Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

Figure 4 for Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

Share this with someone who'll enjoy it:

Abstract:Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a natural potential explanation is that flatness implies generalization. This work critically examines this explanation. Through theoretical and empirical investigation, we identify the following three scenarios for two-layer ReLU networks: (1) flatness provably implies generalization; (2) there exist non-generalizing flattest models and sharpness minimization algorithms fail to generalize, and (3) perhaps most surprisingly, there exist non-generalizing flattest models, but sharpness minimization algorithms still generalize. Our results suggest that the relationship between sharpness and generalization subtly depends on the data distributions and the model architectures and sharpness minimization algorithms do not only minimize sharpness to achieve better generalization. This calls for the search for other explanations for the generalization of over-parameterized neural networks.

* 34 pages,11 figures

View paper on

Share this with someone who'll enjoy it:

Title:Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

Paper and Code