Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering

Jan 24, 2018

Zhe Wang, Xiaoyi Liu, Liangjian Chen, Limin Wang, Yu Qiao, Xiaohui Xie, Charless Fowlkes

Figure 1 for Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering

Figure 2 for Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering

Figure 3 for Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering

Figure 4 for Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering

Share this with someone who'll enjoy it:

Abstract:Visual question answering (VQA) is of significant interest due to its potential to be a strong test of image understanding systems and to probe the connection between language and vision. Despite much recent progress, general VQA is far from a solved problem. In this paper, we focus on the VQA multiple-choice task, and provide some good practices for designing an effective VQA model that can capture language-vision interactions and perform joint reasoning. We explore mechanisms of incorporating part-of-speech (POS) tag guided attention, convolutional n-grams, triplet attention interactions between the image, question and candidate answer, and structured learning for triplets based on image-question pairs. We evaluate our models on two popular datasets: Visual7W and VQA Real Multiple Choice. Our final model achieves the state-of-the-art performance of 68.2% on Visual7W, and a very competitive performance of 69.6% on the test-standard split of VQA Real Multiple Choice.

* 8 pages, 5 figures, state-of-the-art VQA system; https://github.com/wangzheallen/STL-VQA

View paper on

Share this with someone who'll enjoy it:

Title:Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering

Paper and Code