Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Discovering Spatio-Temporal Rationales for Video Question Answering

Jul 22, 2023

Yicong Li, Junbin Xiao, Chun Feng, Xiang Wang, Tat-Seng Chua

Figure 1 for Discovering Spatio-Temporal Rationales for Video Question Answering

Figure 2 for Discovering Spatio-Temporal Rationales for Video Question Answering

Figure 3 for Discovering Spatio-Temporal Rationales for Video Question Answering

Figure 4 for Discovering Spatio-Temporal Rationales for Video Question Answering

Share this with someone who'll enjoy it:

Abstract:This paper strives to solve complex video question answering (VideoQA) which features long video containing multiple objects and events at different time. To tackle the challenge, we highlight the importance of identifying question-critical temporal moments and spatial objects from the vast amount of video content. Towards this, we propose a Spatio-Temporal Rationalization (STR), a differentiable selection module that adaptively collects question-critical moments and objects using cross-modal interaction. The discovered video moments and objects are then served as grounded rationales to support answer reasoning. Based on STR, we further propose TranSTR, a Transformer-style neural network architecture that takes STR as the core and additionally underscores a novel answer interaction mechanism to coordinate STR for answer decoding. Experiments on four datasets show that TranSTR achieves new state-of-the-art (SoTA). Especially, on NExT-QA and Causal-VidQA which feature complex VideoQA, it significantly surpasses the previous SoTA by 5.8\% and 6.8\%, respectively. We then conduct extensive studies to verify the importance of STR as well as the proposed answer interaction mechanism. With the success of TranSTR and our comprehensive analysis, we hope this work can spark more future efforts in complex VideoQA. Code will be released at https://github.com/yl3800/TranSTR.

* Accepted to ICCV2023

View paper on

Share this with someone who'll enjoy it:

Title:Discovering Spatio-Temporal Rationales for Video Question Answering

Paper and Code