Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining

Aug 25, 2022

Xiaoyi Dong, Yinglin Zheng, Jianmin Bao, Ting Zhang, Dongdong Chen, Hao Yang, Ming Zeng, Weiming Zhang, Lu Yuan, Dong Chen(+2 more)

Figure 1 for MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining

Figure 2 for MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining

Figure 3 for MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining

Figure 4 for MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining

Share this with someone who'll enjoy it:

Abstract:This paper presents a simple yet effective framework MaskCLIP, which incorporates a newly proposed masked self-distillation into contrastive language-image pretraining. The core idea of masked self-distillation is to distill representation from a full image to the representation predicted from a masked image. Such incorporation enjoys two vital benefits. First, masked self-distillation targets local patch representation learning, which is complementary to vision-language contrastive focusing on text-related representation.Second, masked self-distillation is also consistent with vision-language contrastive from the perspective of training objective as both utilize the visual encoder for feature aligning, and thus is able to learn local semantics getting indirect supervision from the language. We provide specially designed experiments with a comprehensive analysis to validate the two benefits. Empirically, we show that MaskCLIP, when applied to various challenging downstream tasks, achieves superior results in linear probing, finetuning as well as the zero-shot performance with the guidance of the language encoder.

View paper on

Share this with someone who'll enjoy it:

Title:MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining

Paper and Code