Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Visual In-Context Learning for Large Vision-Language Models

Feb 18, 2024

Yucheng Zhou, Xiang Li, Qianning Wang, Jianbing Shen

Figure 1 for Visual In-Context Learning for Large Vision-Language Models

Figure 2 for Visual In-Context Learning for Large Vision-Language Models

Figure 3 for Visual In-Context Learning for Large Vision-Language Models

Figure 4 for Visual In-Context Learning for Large Vision-Language Models

Share this with someone who'll enjoy it:

Abstract:In Large Visual Language Models (LVLMs), the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities. To overcome these challenges, we introduce a novel Visual In-Context Learning (VICL) method comprising Visual Demonstration Retrieval, Intent-Oriented Image Summarization, and Intent-Oriented Demonstration Composition. Our approach retrieves images via ''Retrieval & Rerank'' paradigm, summarises images with task intent and task-specific visual parsing, and composes language-based demonstrations that reduce token count and alleviate cross-modal interaction problem. Experimental evaluations on five visual reasoning datasets demonstrate the effectiveness of our method. Moreover, our extensive experiments leverage information flow analysis to elucidate the effectiveness of our method, and investigate the impact of length and position of demonstrations for LVLM. The use of in-context unlearning further shows promise in resetting specific model knowledge without retraining.

* 13 pages, 7 figures

View paper on

Share this with someone who'll enjoy it:

Title:Visual In-Context Learning for Large Vision-Language Models

Paper and Code