Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Decoupled Spatial Temporal Graphs for Generic Visual Grounding

Mar 18, 2021

Qianyu Feng, Yunchao Wei, Mingming Cheng, Yi Yang

Figure 1 for Decoupled Spatial Temporal Graphs for Generic Visual Grounding

Figure 2 for Decoupled Spatial Temporal Graphs for Generic Visual Grounding

Figure 3 for Decoupled Spatial Temporal Graphs for Generic Visual Grounding

Figure 4 for Decoupled Spatial Temporal Graphs for Generic Visual Grounding

Share this with someone who'll enjoy it:

Abstract:Visual grounding is a long-lasting problem in vision-language understanding due to its diversity and complexity. Current practices concentrate mostly on performing visual grounding in still images or well-trimmed video clips. This work, on the other hand, investigates into a more general setting, generic visual grounding, aiming to mine all the objects satisfying the given expression, which is more challenging yet practical in real-world scenarios. Importantly, grounding results are expected to accurately localize targets in both space and time. Whereas, it is tricky to make trade-offs between the appearance and motion features. In real scenarios, model tends to fail in distinguishing distractors with similar attributes. Motivated by these considerations, we propose a simple yet effective approach, named DSTG, which commits to 1) decomposing the spatial and temporal representations to collect all-sided cues for precise grounding; 2) enhancing the discriminativeness from distractors and the temporal consistency with a contrastive learning routing strategy. We further elaborate a new video dataset, GVG, that consists of challenging referring cases with far-ranging videos. Empirical experiments well demonstrate the superiority of DSTG over state-of-the-art on Charades-STA, ActivityNet-Caption and GVG datasets. Code and dataset will be made available.

View paper on

Share this with someone who'll enjoy it:

Title:Decoupled Spatial Temporal Graphs for Generic Visual Grounding

Paper and Code