Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual Question Answering

Dec 20, 2023

Chengxiang Yin, Zhengping Che, Kun Wu, Zhiyuan Xu, Jian Tang

Figure 1 for Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual Question Answering

Figure 2 for Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual Question Answering

Figure 3 for Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual Question Answering

Figure 4 for Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual Question Answering

Share this with someone who'll enjoy it:

Abstract:Visual Question Answering (VQA) has emerged as one of the most challenging tasks in artificial intelligence due to its multi-modal nature. However, most existing VQA methods are incapable of handling Knowledge-based Visual Question Answering (KB-VQA), which requires external knowledge beyond visible contents to answer questions about a given image. To address this issue, we propose a novel framework that endows the model with capabilities of answering more general questions, and achieves a better exploitation of external knowledge through generating Multiple Clues for Reasoning with Memory Neural Networks (MCR-MemNN). Specifically, a well-defined detector is adopted to predict image-question related relation phrases, each of which delivers two complementary clues to retrieve the supporting facts from external knowledge base (KB), which are further encoded into a continuous embedding space using a content-addressable memory. Afterwards, mutual interactions between visual-semantic representation and the supporting facts stored in memory are captured to distill the most relevant information in three modalities (i.e., image, question, and KB). Finally, the optimal answer is predicted by choosing the supporting fact with the highest score. We conduct extensive experiments on two widely-used benchmarks. The experimental results well justify the effectiveness of MCR-MemNN, as well as its superiority over other KB-VQA methods.

View paper on

Share this with someone who'll enjoy it:

Title:Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual Question Answering

Paper and Code