Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pierre Foret

Google Research

NeurIPS 2020 Competition: Predicting Generalization in Deep Learning

Dec 14, 2020

Yiding Jiang, Pierre Foret, Scott Yak, Daniel M. Roy, Hossein Mobahi, Gintare Karolina Dziugaite, Samy Bengio, Suriya Gunasekar, Isabelle Guyon, Behnam Neyshabur

Figure 1 for NeurIPS 2020 Competition: Predicting Generalization in Deep Learning

Abstract:Understanding generalization in deep learning is arguably one of the most important questions in deep learning. Deep learning has been successfully adopted to a large number of problems ranging from pattern recognition to complex decision making, but many recent researchers have raised many concerns about deep learning, among which the most important is generalization. Despite numerous attempts, conventional statistical learning approaches have yet been able to provide a satisfactory explanation on why deep learning works. A recent line of works aims to address the problem by trying to predict the generalization performance through complexity measures. In this competition, we invite the community to propose complexity measures that can accurately predict generalization of models. A robust and general complexity measure would potentially lead to a better understanding of deep learning's underlying mechanism and behavior of deep models on unseen data, or shed light on better generalization bounds. All these outcomes will be important for making deep learning more robust and reliable.

* 20 pages, 2 figures. Accepted for NeurIPS 2020 Competitions Track. Lead organizer: Yiding Jiang

Via

Access Paper or Ask Questions

Sharpness-Aware Minimization for Efficiently Improving Generalization

Oct 03, 2020

Pierre Foret, Ariel Kleiner, Hossein Mobahi, Behnam Neyshabur

Figure 1 for Sharpness-Aware Minimization for Efficiently Improving Generalization

Figure 2 for Sharpness-Aware Minimization for Efficiently Improving Generalization

Figure 3 for Sharpness-Aware Minimization for Efficiently Improving Generalization

Figure 4 for Sharpness-Aware Minimization for Efficiently Improving Generalization

Abstract:In today's heavily overparameterized models, the value of the training loss provides few guarantees on model generalization ability. Indeed, optimizing only the training loss value, as is commonly done, can easily lead to suboptimal model quality. Motivated by the connection between geometry of the loss landscape and generalization---including a generalization bound that we prove here---we introduce a novel, effective procedure for instead simultaneously minimizing loss value and loss sharpness. In particular, our procedure, Sharpness-Aware Minimization (SAM), seeks parameters that lie in neighborhoods having uniformly low loss; this formulation results in a min-max optimization problem on which gradient descent can be performed efficiently. We present empirical results showing that SAM improves model generalization across a variety of benchmark datasets (e.g., CIFAR-{10, 100}, ImageNet, finetuning tasks) and models, yielding novel state-of-the-art performance for several. Additionally, we find that SAM natively provides robustness to label noise on par with that provided by state-of-the-art procedures that specifically target learning with noisy labels.

Via

Access Paper or Ask Questions

A Multilingual View of Unsupervised Machine Translation

Feb 21, 2020

Xavier Garcia, Pierre Foret, Thibault Sellam, Ankur P. Parikh

Figure 1 for A Multilingual View of Unsupervised Machine Translation

Figure 2 for A Multilingual View of Unsupervised Machine Translation

Figure 3 for A Multilingual View of Unsupervised Machine Translation

Figure 4 for A Multilingual View of Unsupervised Machine Translation

Abstract:We present a probabilistic framework for multilingual neural machine translation that encompasses supervised and unsupervised setups, focusing on unsupervised translation. In addition to studying the vanilla case where there is only monolingual data available, we propose a novel setup where one language in the (source, target) pair is not associated with any parallel data, but there may exist auxiliary parallel data that contains the other. This auxiliary data can naturally be utilized in our probabilistic framework via a novel cross-translation loss term. Empirically, we show that our approach results in higher BLEU scores over state-of-the-art unsupervised models on the WMT'14 English-French, WMT'16 English-German, and WMT'16 English-Romanian datasets in most directions. In particular, we obtain a +1.65 BLEU advantage over the best-performing unsupervised model in the Romanian-English direction.

* Added new reference, fixed typos

Via

Access Paper or Ask Questions