Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift

Oct 08, 2023

Yihao Xue, Siddharth Joshi, Dang Nguyen, Baharan Mirzasoleiman

Figure 1 for Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift

Figure 2 for Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift

Figure 3 for Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift

Figure 4 for Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift

Share this with someone who'll enjoy it:

Abstract:Recently, multimodal contrastive learning (MMCL) approaches, such as CLIP, have achieved a remarkable success in learning representations that are robust against distribution shift and generalize to new domains. Despite the empirical success, the mechanism behind learning such generalizable representations is not understood. In this work, we rigorously analyze this problem and uncover two mechanisms behind MMCL's robustness: \emph{intra-class contrasting}, which allows the model to learn features with a high variance, and \emph{inter-class feature sharing}, where annotated details in one class help learning other classes better. Both mechanisms prevent spurious features that are over-represented in the training data to overshadow the generalizable core features. This yields superior zero-shot classification accuracy under distribution shift. Furthermore, we theoretically demonstrate the benefits of using rich captions on robustness and explore the effect of annotating different types of details in the captions. We validate our theoretical findings through experiments, including a well-designed synthetic experiment and an experiment involving training CLIP on MS COCO and evaluating the model on variations of shifted ImageNet.

View paper on

Share this with someone who'll enjoy it:

Title:Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift

Paper and Code