Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Localization vs. Semantics: How Can Language Benefit Visual Representation Learning?

Dec 01, 2022

Zhuowan Li, Cihang Xie, Benjamin Van Durme, Alan Yuille

Figure 1 for Localization vs. Semantics: How Can Language Benefit Visual Representation Learning?

Figure 2 for Localization vs. Semantics: How Can Language Benefit Visual Representation Learning?

Figure 3 for Localization vs. Semantics: How Can Language Benefit Visual Representation Learning?

Figure 4 for Localization vs. Semantics: How Can Language Benefit Visual Representation Learning?

Share this with someone who'll enjoy it:

Abstract:Despite the superior performance brought by vision-and-language pretraining, it remains unclear whether learning with multi-modal data can help understand each individual modality. In this work, we investigate how language can help with visual representation learning from a probing perspective. Specifically, we compare vision-and-language and vision-only models by probing their visual representations on a broad range of tasks, in order to assess the quality of the learned representations in a fine-grained manner. Interestingly, our probing results suggest that vision-and-language models are better at label prediction tasks like object and attribute prediction, while vision-only models are stronger at dense prediction tasks that require more localized information. With further analysis using detailed metrics, our study suggests that language helps vision models learn better semantics, but not localization. Code is released at https://github.com/Lizw14/visual_probing.

* Code is released at https://github.com/Lizw14/visual_probing

View paper on

Share this with someone who'll enjoy it:

Title:Localization vs. Semantics: How Can Language Benefit Visual Representation Learning?

Paper and Code