Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Motion-Appearance Co-Memory Networks for Video Question Answering

Mar 29, 2018

Jiyang Gao, Runzhou Ge, Kan Chen, Ram Nevatia

Figure 1 for Motion-Appearance Co-Memory Networks for Video Question Answering

Figure 2 for Motion-Appearance Co-Memory Networks for Video Question Answering

Figure 3 for Motion-Appearance Co-Memory Networks for Video Question Answering

Figure 4 for Motion-Appearance Co-Memory Networks for Video Question Answering

Share this with someone who'll enjoy it:

Abstract:Video Question Answering (QA) is an important task in understanding video temporal structure. We observe that there are three unique attributes of video QA compared with image QA: (1) it deals with long sequences of images containing richer information not only in quantity but also in variety; (2) motion and appearance information are usually correlated with each other and able to provide useful attention cues to the other; (3) different questions require different number of frames to infer the answer. Based these observations, we propose a motion-appearance comemory network for video QA. Our networks are built on concepts from Dynamic Memory Network (DMN) and introduces new mechanisms for video QA. Specifically, there are three salient aspects: (1) a co-memory attention mechanism that utilizes cues from both motion and appearance to generate attention; (2) a temporal conv-deconv network to generate multi-level contextual facts; (3) a dynamic fact ensemble method to construct temporal representation dynamically for different questions. We evaluate our method on TGIF-QA dataset, and the results outperform state-of-the-art significantly on all four tasks of TGIF-QA.

* CVPR 2018

View paper on

Share this with someone who'll enjoy it:

Title:Motion-Appearance Co-Memory Networks for Video Question Answering

Paper and Code