Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration

Oct 27, 2024

Libo Qin, Qiguang Chen, Hao Fei, Zhi Chen, Min Li, Wanxiang Che

Figure 1 for What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration

Figure 2 for What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration

Figure 3 for What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration

Figure 4 for What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration

Share this with someone who'll enjoy it:

Abstract:Recently, rapid advancements in Multi-Modal In-Context Learning (MM-ICL) have achieved notable success, which is capable of achieving superior performance across various tasks without requiring additional parameter tuning. However, the underlying rules for the effectiveness of MM-ICL remain under-explored. To fill this gap, this work aims to investigate the research question: "What factors affect the performance of MM-ICL?'' To this end, we investigate extensive experiments on the three core steps of MM-ICL including demonstration retrieval, demonstration ordering, and prompt construction using 6 vision large language models and 20 strategies. Our findings highlight (1) the necessity of a multi-modal retriever for demonstration retrieval, (2) the importance of intra-demonstration ordering over inter-demonstration ordering, and (3) the enhancement of task comprehension through introductory instructions in prompts. We hope this study can serve as a foundational guide for optimizing MM-ICL strategies in future research.

* Accepted at NeurIPS 2024

View paper on

Share this with someone who'll enjoy it:

Title:What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration

Paper and Code