Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Target Adaptive Context Aggregation for Video Scene Graph Generation

Aug 18, 2021

Yao Teng, Limin Wang, Zhifeng Li, Gangshan Wu

Figure 1 for Target Adaptive Context Aggregation for Video Scene Graph Generation

Figure 2 for Target Adaptive Context Aggregation for Video Scene Graph Generation

Figure 3 for Target Adaptive Context Aggregation for Video Scene Graph Generation

Figure 4 for Target Adaptive Context Aggregation for Video Scene Graph Generation

Share this with someone who'll enjoy it:

Abstract:This paper deals with a challenging task of video scene graph generation (VidSGG), which could serve as a structured video representation for high-level understanding tasks. We present a new {\em detect-to-track} paradigm for this task by decoupling the context modeling for relation prediction from the complicated low-level entity tracking. Specifically, we design an efficient method for frame-level VidSGG, termed as {\em Target Adaptive Context Aggregation Network} (TRACE), with a focus on capturing spatio-temporal context information for relation recognition. Our TRACE framework streamlines the VidSGG pipeline with a modular design, and presents two unique blocks of Hierarchical Relation Tree (HRTree) construction and Target-adaptive Context Aggregation. More specific, our HRTree first provides an adpative structure for organizing possible relation candidates efficiently, and guides context aggregation module to effectively capture spatio-temporal structure information. Then, we obtain a contextualized feature representation for each relation candidate and build a classification head to recognize its relation category. Finally, we provide a simple temporal association strategy to track TRACE detected results to yield the video-level VidSGG. We perform experiments on two VidSGG benchmarks: ImageNet-VidVRD and Action Genome, and the results demonstrate that our TRACE achieves the state-of-the-art performance. The code and models are made available at \url{https://github.com/MCG-NJU/TRACE}.

* ICCV 2021 camera-ready version

View paper on

Share this with someone who'll enjoy it:

Title:Target Adaptive Context Aggregation for Video Scene Graph Generation

Paper and Code