Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models

Jun 02, 2023

Masoud Monajatipoor, Liunian Harold Li, Mozhdeh Rouhsedaghat, Lin F. Yang, Kai-Wei Chang

Figure 1 for MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models

Figure 2 for MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models

Figure 3 for MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models

Figure 4 for MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models

Share this with someone who'll enjoy it:

Abstract:Large-scale language models have shown the ability to adapt to a new task via conditioning on a few demonstrations (i.e., in-context learning). However, in the vision-language domain, most large-scale pre-trained vision-language (VL) models do not possess the ability to conduct in-context learning. How can we enable in-context learning for VL models? In this paper, we study an interesting hypothesis: can we transfer the in-context learning ability from the language domain to VL domain? Specifically, we first meta-trains a language model to perform in-context learning on NLP tasks (as in MetaICL); then we transfer this model to perform VL tasks by attaching a visual encoder. Our experiments suggest that indeed in-context learning ability can be transferred cross modalities: our model considerably improves the in-context learning capability on VL tasks and can even compensate for the size of the model significantly. On VQA, OK-VQA, and GQA, our method could outperform the baseline model while having 20 times fewer parameters.

View paper on

Share this with someone who'll enjoy it:

Title:MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models

Paper and Code