Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MMHQA-ICL: Multimodal In-context Learning for Hybrid Question Answering over Text, Tables and Images

Sep 09, 2023

Weihao Liu, Fangyu Lei, Tongxu Luo, Jiahe Lei, Shizhu He, Jun Zhao, Kang Liu

Figure 1 for MMHQA-ICL: Multimodal In-context Learning for Hybrid Question Answering over Text, Tables and Images

Figure 2 for MMHQA-ICL: Multimodal In-context Learning for Hybrid Question Answering over Text, Tables and Images

Figure 3 for MMHQA-ICL: Multimodal In-context Learning for Hybrid Question Answering over Text, Tables and Images

Figure 4 for MMHQA-ICL: Multimodal In-context Learning for Hybrid Question Answering over Text, Tables and Images

Share this with someone who'll enjoy it:

Abstract:In the real world, knowledge often exists in a multimodal and heterogeneous form. Addressing the task of question answering with hybrid data types, including text, tables, and images, is a challenging task (MMHQA). Recently, with the rise of large language models (LLM), in-context learning (ICL) has become the most popular way to solve QA problems. We propose MMHQA-ICL framework for addressing this problems, which includes stronger heterogeneous data retriever and an image caption module. Most importantly, we propose a Type-specific In-context Learning Strategy for MMHQA, enabling LLMs to leverage their powerful performance in this task. We are the first to use end-to-end LLM prompting method for this task. Experimental results demonstrate that our framework outperforms all baselines and methods trained on the full dataset, achieving state-of-the-art results under the few-shot setting on the MultimodalQA dataset.

View paper on

Share this with someone who'll enjoy it:

Title:MMHQA-ICL: Multimodal In-context Learning for Hybrid Question Answering over Text, Tables and Images

Paper and Code