Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Chain of Thought Prompt Tuning in Vision Language Models

Apr 16, 2023

Jiaxin Ge, Hongyin Luo, Siyuan Qian, Yulu Gan, Jie Fu, Shanghang Zhan

Figure 1 for Chain of Thought Prompt Tuning in Vision Language Models

Figure 2 for Chain of Thought Prompt Tuning in Vision Language Models

Figure 3 for Chain of Thought Prompt Tuning in Vision Language Models

Figure 4 for Chain of Thought Prompt Tuning in Vision Language Models

Share this with someone who'll enjoy it:

Abstract:Language-Image Pre-training has demonstrated promising results on zero-shot and few-shot downstream tasks by prompting visual models with natural language prompts. However, most recent studies only use a single prompt for tuning, neglecting the inherent step-to-step cognitive reasoning process that humans conduct in complex task settings, for example, when processing images from unfamiliar domains. Chain of Thought is a simple and effective approximation to human reasoning process and has been proven useful for natural language processing (NLP) tasks. Based on this cognitive intuition, we believe that conducting effective reasoning is also an important problem in visual tasks, and a chain of thought could be a solution to this problem. In this work, we propose a novel chain of thought prompt tuning for vision-language modeling. Extensive experiments show that our method not only generalizes better in image classification tasks, has greater transferability beyond a single dataset, and has stronger domain generalization performance, but also performs much better in imagetext retrieval and visual question answering, which require more reasoning capabilities. We are the first to successfully adapt chain-of-thought prompting that combines visual and textual embeddings. We will release our codes

View paper on

Share this with someone who'll enjoy it:

Title:Chain of Thought Prompt Tuning in Vision Language Models

Paper and Code