Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Guillotine Regularization: Improving Deep Networks Generalization by Removing their Head

Jun 27, 2022

Florian Bordes, Randall Balestriero, Quentin Garrido, Adrien Bardes, Pascal Vincent

Figure 1 for Guillotine Regularization: Improving Deep Networks Generalization by Removing their Head

Figure 2 for Guillotine Regularization: Improving Deep Networks Generalization by Removing their Head

Figure 3 for Guillotine Regularization: Improving Deep Networks Generalization by Removing their Head

Figure 4 for Guillotine Regularization: Improving Deep Networks Generalization by Removing their Head

Share this with someone who'll enjoy it:

Abstract:One unexpected technique that emerged in recent years consists in training a Deep Network (DN) with a Self-Supervised Learning (SSL) method, and using this network on downstream tasks but with its last few layers entirely removed. This usually skimmed-over trick is actually critical for SSL methods to display competitive performances. For example, on ImageNet classification, more than 30 points of percentage can be gained that way. This is a little vexing, as one would hope that the network layer at which invariance is explicitly enforced by the SSL criterion during training (the last layer) should be the one to use for best generalization performance downstream. But it seems not to be, and this study sheds some light on why. This trick, which we name Guillotine Regularization (GR), is in fact a generically applicable form of regularization that has also been used to improve generalization performance in transfer learning scenarios. In this work, through theory and experiments, we formalize GR and identify the underlying reasons behind its success in SSL methods. Our study shows that the use of this trick is essential to SSL performance for two main reasons: (i) improper data-augmentations to define the positive pairs used during training, and/or (ii) suboptimal selection of the hyper-parameters of the SSL loss.

View paper on

Share this with someone who'll enjoy it:

Title:Guillotine Regularization: Improving Deep Networks Generalization by Removing their Head

Paper and Code