Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Multimodal Dual Attention Memory for Video Story Question Answering

Sep 21, 2018

Kyung-Min Kim, Seong-Ho Choi, Jin-Hwa Kim, Byoung-Tak Zhang

Figure 1 for Multimodal Dual Attention Memory for Video Story Question Answering

Figure 2 for Multimodal Dual Attention Memory for Video Story Question Answering

Figure 3 for Multimodal Dual Attention Memory for Video Story Question Answering

Figure 4 for Multimodal Dual Attention Memory for Video Story Question Answering

Share this with someone who'll enjoy it:

Abstract:We propose a video story question-answering (QA) architecture, Multimodal Dual Attention Memory (MDAM). The key idea is to use a dual attention mechanism with late fusion. MDAM uses self-attention to learn the latent concepts in scene frames and captions. Given a question, MDAM uses the second attention over these latent concepts. Multimodal fusion is performed after the dual attention processes (late fusion). Using this processing pipeline, MDAM learns to infer a high-level vision-language joint representation from an abstraction of the full video content. We evaluate MDAM on PororoQA and MovieQA datasets which have large-scale QA annotations on cartoon videos and movies, respectively. For both datasets, MDAM achieves new state-of-the-art results with significant margins compared to the runner-up models. We confirm the best performance of the dual attention mechanism combined with late fusion by ablation studies. We also perform qualitative analysis by visualizing the inference mechanisms of MDAM.

* Accepted for ECCV 2018

View paper on

Share this with someone who'll enjoy it:

Title:Multimodal Dual Attention Memory for Video Story Question Answering

Paper and Code