Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A Picture May Be Worth a Hundred Words for Visual Question Answering

Jun 25, 2021

Yusuke Hirota, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima, Ittetsu Taniguchi, Takao Onoye

Figure 1 for A Picture May Be Worth a Hundred Words for Visual Question Answering

Figure 2 for A Picture May Be Worth a Hundred Words for Visual Question Answering

Figure 3 for A Picture May Be Worth a Hundred Words for Visual Question Answering

Figure 4 for A Picture May Be Worth a Hundred Words for Visual Question Answering

Share this with someone who'll enjoy it:

Abstract:How far can we go with textual representations for understanding pictures? In image understanding, it is essential to use concise but detailed image representations. Deep visual features extracted by vision models, such as Faster R-CNN, are prevailing used in multiple tasks, and especially in visual question answering (VQA). However, conventional deep visual features may struggle to convey all the details in an image as we humans do. Meanwhile, with recent language models' progress, descriptive text may be an alternative to this problem. This paper delves into the effectiveness of textual representations for image understanding in the specific context of VQA. We propose to take description-question pairs as input, instead of deep visual features, and fed them into a language-only Transformer model, simplifying the process and the computational cost. We also experiment with data augmentation techniques to increase the diversity in the training set and avoid learning statistical bias. Extensive evaluations have shown that textual representations require only about a hundred words to compete with deep visual features on both VQA 2.0 and VQA-CP v2.

View paper on

Share this with someone who'll enjoy it:

Title:A Picture May Be Worth a Hundred Words for Visual Question Answering

Paper and Code