Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sahana Upadhya

Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning

Oct 27, 2021

Aakash Kaku, Sahana Upadhya, Narges Razavian

Figure 1 for Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning

Figure 2 for Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning

Figure 3 for Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning

Figure 4 for Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning

Abstract:We show that bringing intermediate layers' representations of two augmented versions of an image closer together in self-supervised learning helps to improve the momentum contrastive (MoCo) method. To this end, in addition to the contrastive loss, we minimize the mean squared error between the intermediate layer representations or make their cross-correlation matrix closer to an identity matrix. Both loss objectives either outperform standard MoCo, or achieve similar performances on three diverse medical imaging datasets: NIH-Chest Xrays, Breast Cancer Histopathology, and Diabetic Retinopathy. The gains of the improved MoCo are especially large in a low-labeled data regime (e.g. 1% labeled data) with an average gain of 5% across three datasets. We analyze the models trained using our novel approach via feature similarity analysis and layer-wise probing. Our analysis reveals that models trained via our approach have higher feature reuse compared to a standard MoCo and learn informative features earlier in the network. Finally, by comparing the output probability distribution of models fine-tuned on small versus large labeled data, we conclude that our proposed method of pre-training leads to lower Kolmogorov-Smirnov distance, as compared to a standard MoCo. This provides additional evidence that our proposed method learns more informative features in the pre-training phase which could be leveraged in a low-labeled data regime.

* Accepted at NeurIPS 2021 (main conference)

Via

Access Paper or Ask Questions