Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large Language Models

Dec 04, 2023

Bingshuai Liu, Chenyang Lyu, Zijun Min, Zhanyu Wang, Jinsong Su, Longyue Wang

Figure 1 for Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large Language Models

Figure 2 for Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large Language Models

Figure 3 for Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large Language Models

Figure 4 for Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large Language Models

Share this with someone who'll enjoy it:

Abstract:The advancement of Large Language Models(LLMs) has brought substantial attention to the Chain of Thought(CoT) approach, primarily due to its ability to enhance the capability of LLMs on tasks requiring complex reasoning. Moreover, the significance of CoT approaches extends to the application of LLMs for multi-modal tasks, such as multi-modal question answering. However, the selection of optimal CoT demonstration examples in multi-modal reasoning for LLMs remains less explored for LLMs due to the inherent complexity of multi-modal examples. In this paper, we introduce a novel approach that addresses this challenge by using retrieval mechanisms to dynamically and automatically select demonstration examples based on cross-modal similarities. This method aims to refine the CoT reasoning process in multi-modal scenarios via informing LLMs with more relevant and informative examples. Furthermore, we employ a stratified sampling method categorising demonstration examples into groups based on their types and retrieving examples from different groups respectively to promote the diversity of demonstration examples. Through a series of experiments, we demonstrate that our approach significantly improves the performance of LLMs, achieving state-of-the-art results in multi-modal reasoning tasks. Specifically, our methods demonstrate significant advancements on the ScienceQA dataset. While our method based on ChatGPT outperforms the Chameleon(ChatGPT) by 2.74% with an accuracy of 82.67%, the GPT4-based approach surpasses the Chameleon(GPT-4) by 0.89%, achieving 87.43% on accuracy under the same setting. Moreover, our best performing show a 6.05% increase over Chameleon for ChatGPT-based models and a 4.57% increase for GPT-4-based models.

* Work in progress

View paper on

Share this with someone who'll enjoy it:

Title:Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large Language Models

Paper and Code