Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Do Vision-Language Pretrained Models Learn Primitive Concepts?

Mar 31, 2022

Tian Yun, Usha Bhalla, Ellie Pavlick, Chen Sun

Figure 1 for Do Vision-Language Pretrained Models Learn Primitive Concepts?

Figure 2 for Do Vision-Language Pretrained Models Learn Primitive Concepts?

Figure 3 for Do Vision-Language Pretrained Models Learn Primitive Concepts?

Figure 4 for Do Vision-Language Pretrained Models Learn Primitive Concepts?

Share this with someone who'll enjoy it:

Abstract:Vision-language pretrained models have achieved impressive performance on multimodal reasoning and zero-shot recognition tasks. Many of these VL models are pretrained on unlabeled image and caption pairs from the internet. In this paper, we study whether the notion of primitive concepts, such as color and shape attributes, emerges automatically from these pretrained VL models. We propose to learn compositional derivations that map primitive concept activations into composite concepts, a task which we demonstrate to be straightforward given true primitive concept annotations. This compositional derivation learning (CompDL) framework allows us to quantitively measure the usefulness and interpretability of the learned derivations, by jointly considering the entire set of candidate primitive concepts. Our study reveals that state-of-the-art VL pretrained models learn primitive concepts that are highly useful as visual descriptors, as demonstrated by their strong performance on fine-grained visual recognition tasks, but those concepts struggle to provide interpretable compositional derivations, which highlights limitations of existing VL models. Code and models will be released.

* Under review

View paper on

Share this with someone who'll enjoy it:

Title:Do Vision-Language Pretrained Models Learn Primitive Concepts?

Paper and Code