Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Tripping through time: Efficient Localization of Activities in Videos

Apr 25, 2019

Meera Hahn, Asim Kadav, James M. Rehg, Hans Peter Graf

Figure 1 for Tripping through time: Efficient Localization of Activities in Videos

Figure 2 for Tripping through time: Efficient Localization of Activities in Videos

Figure 3 for Tripping through time: Efficient Localization of Activities in Videos

Figure 4 for Tripping through time: Efficient Localization of Activities in Videos

Share this with someone who'll enjoy it:

Abstract:Localizing moments in untrimmed videos via language queries is a new and interesting task that requires the ability to accurately ground language into video. Previous works have approached this task by processing the entire video, often more than once, to localize relevant activities. In the real world applications that this task lends itself to, such as surveillance, efficiency a is pivotal trait of a system. In this paper, we present TripNet, an end-to-end system that uses a gated attention architecture to model fine-grained textual and visual representations in order to align text and video content. Furthermore, TripNet uses reinforcement learning to efficiently localize relevant activity clips in long videos, by learning how to intelligently skip around the video. In our evaluation over Charades-STA, ActivityNet Captions and the TACoS dataset, we find that TripNet achieves high accuracy and saves processing time by only looking at 32-41% of the entire video.

View paper on

Share this with someone who'll enjoy it:

Title:Tripping through time: Efficient Localization of Activities in Videos

Paper and Code