Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Weakly-Supervised Multi-Level Attentional Reconstruction Network for Grounding Textual Queries in Videos

Mar 16, 2020

Yijun Song, Jingwen Wang, Lin Ma, Zhou Yu, Jun Yu

Figure 1 for Weakly-Supervised Multi-Level Attentional Reconstruction Network for Grounding Textual Queries in Videos

Figure 2 for Weakly-Supervised Multi-Level Attentional Reconstruction Network for Grounding Textual Queries in Videos

Figure 3 for Weakly-Supervised Multi-Level Attentional Reconstruction Network for Grounding Textual Queries in Videos

Figure 4 for Weakly-Supervised Multi-Level Attentional Reconstruction Network for Grounding Textual Queries in Videos

Share this with someone who'll enjoy it:

Abstract:The task of temporally grounding textual queries in videos is to localize one video segment that semantically corresponds to the given query. Most of the existing approaches rely on segment-sentence pairs (temporal annotations) for training, which are usually unavailable in real-world scenarios. In this work we present an effective weakly-supervised model, named as Multi-Level Attentional Reconstruction Network (MARN), which only relies on video-sentence pairs during the training stage. The proposed method leverages the idea of attentional reconstruction and directly scores the candidate segments with the learnt proposal-level attentions. Moreover, another branch learning clip-level attention is exploited to refine the proposals at both the training and testing stage. We develop a novel proposal sampling mechanism to leverage intra-proposal information for learning better proposal representation and adopt 2D convolution to exploit inter-proposal clues for learning reliable attention map. Experiments on Charades-STA and ActivityNet-Captions datasets demonstrate the superiority of our MARN over the existing weakly-supervised methods.

View paper on

Share this with someone who'll enjoy it:

Title:Weakly-Supervised Multi-Level Attentional Reconstruction Network for Grounding Textual Queries in Videos

Paper and Code