Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings

May 03, 2023

Daniel Rose, Vaishnavi Himakunthala, Andy Ouyang, Ryan He, Alex Mei, Yujie Lu, Michael Saxon, Chinmay Sonar, Diba Mirza, William Yang Wang

Figure 1 for Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings

Figure 2 for Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings

Figure 3 for Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings

Figure 4 for Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings

Share this with someone who'll enjoy it:

Abstract:Recent advances in large language models elicit reasoning in a chain of thought that allows models to decompose problems in a human-like fashion. Though this paradigm improves multi-step reasoning ability in language models, it is limited by being unimodal and applied mainly to question-answering tasks. We claim that incorporating visual augmentation into reasoning is essential, especially for complex, imaginative tasks. Consequently, we introduce VCoT, a novel method that leverages chain of thought prompting with vision-language grounding to recursively bridge the logical gaps within sequential data. Our method uses visual guidance to generate synthetic multimodal infillings that add consistent and novel information to reduce the logical gaps for downstream tasks that can benefit from temporal reasoning, as well as provide interpretability into models' multi-step reasoning. We apply VCoT to the Visual Storytelling and WikiHow summarization datasets and demonstrate through human evaluation that VCoT offers novel and consistent synthetic data augmentation beating chain of thought baselines, which can be used to enhance downstream performance.

View paper on

Share this with someone who'll enjoy it:

Title:Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings

Paper and Code