Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Generation-Guided Multi-Level Unified Network for Video Grounding

Mar 14, 2023

Xing Cheng, Xiangyu Wu, Dong Shen, Hezheng Lin, Fan Yang

Figure 1 for Generation-Guided Multi-Level Unified Network for Video Grounding

Figure 2 for Generation-Guided Multi-Level Unified Network for Video Grounding

Figure 3 for Generation-Guided Multi-Level Unified Network for Video Grounding

Figure 4 for Generation-Guided Multi-Level Unified Network for Video Grounding

Share this with someone who'll enjoy it:

Abstract:Video grounding aims to locate the timestamps best matching the query description within an untrimmed video. Prevalent methods can be divided into moment-level and clip-level frameworks. Moment-level approaches directly predict the probability of each transient moment to be the boundary in a global perspective, and they usually perform better in coarse grounding. On the other hand, clip-level ones aggregate the moments in different time windows into proposals and then deduce the most similar one, leading to its advantage in fine-grained grounding. In this paper, we propose a multi-level unified framework to enhance performance by leveraging the merits of both moment-level and clip-level methods. Moreover, a novel generation-guided paradigm in both levels is adopted. It introduces a multi-modal generator to produce the implicit boundary feature and clip feature, later regarded as queries to calculate the boundary scores by a discriminator. The generation-guided solution enhances video grounding from a two-unique-modals' match task to a cross-modal attention task, which steps out of the previous framework and obtains notable gains. The proposed Generation-guided Multi-level Unified network (GMU) surpasses previous methods and reaches State-Of-The-Art on various benchmarks with disparate features, e.g., Charades-STA, ActivityNet captions.

View paper on

Share this with someone who'll enjoy it:

Title:Generation-Guided Multi-Level Unified Network for Video Grounding

Paper and Code