Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:The Implicit Bias of Gradient Descent on Separable Data

Mar 21, 2018

Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro

Figure 1 for The Implicit Bias of Gradient Descent on Separable Data

Figure 2 for The Implicit Bias of Gradient Descent on Separable Data

Figure 3 for The Implicit Bias of Gradient Descent on Separable Data

Figure 4 for The Implicit Bias of Gradient Descent on Separable Data

Share this with someone who'll enjoy it:

Abstract:We show that gradient descent on an unregularized logistic regression problem, for linearly separable datasets, converges to the direction of the max-margin (hard margin SVM) solution. The result generalizes also to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization in more complex models and with other optimization methods.

* Journal version (previous version appeared as conference paper in ICLR ). Main improvements: We proved measure zero case for main theorem (with implication for the rates), and the multi-class case. Both were not covered in previous version

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:The Implicit Bias of Gradient Descent on Separable Data

Paper and Code