Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning

Oct 08, 2024

Ayush Singh, Mansi Gupta, Shivank Garg, Abhinav Kumar, Vansh Agrawal

Figure 1 for Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning

Figure 2 for Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning

Figure 3 for Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning

Figure 4 for Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning

Share this with someone who'll enjoy it:

Abstract:Vision-Language Models (VLMs) have transformed tasks requiring visual and reasoning abilities, such as image retrieval and Visual Question Answering (VQA). Despite their success, VLMs face significant challenges with tasks involving geometric reasoning, algebraic problem-solving, and counting. These limitations stem from difficulties effectively integrating multiple modalities and accurately interpreting geometry-related tasks. Various works claim that introducing a captioning pipeline before VQA tasks enhances performance. We incorporated this pipeline for tasks involving geometry, algebra, and counting. We found that captioning results are not generalizable, specifically with larger VLMs primarily trained on downstream QnA tasks showing random performance on math-related challenges. However, we present a promising alternative: task-based prompting, enriching the prompt with task-specific guidance. This approach shows promise and proves more effective than direct captioning methods for math-heavy problems.

View paper on

Share this with someone who'll enjoy it:

Title:Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning

Paper and Code