Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Raphael Ruschel

Temporally Consistent Dynamic Scene Graphs: An End-to-End Approach for Action Tracklet Generation

Dec 03, 2024

Raphael Ruschel, Md Awsafur Rahman, Hardik Prajapati, Suya You, B. S. Manjuanth

Figure 1 for Temporally Consistent Dynamic Scene Graphs: An End-to-End Approach for Action Tracklet Generation

Figure 2 for Temporally Consistent Dynamic Scene Graphs: An End-to-End Approach for Action Tracklet Generation

Figure 3 for Temporally Consistent Dynamic Scene Graphs: An End-to-End Approach for Action Tracklet Generation

Figure 4 for Temporally Consistent Dynamic Scene Graphs: An End-to-End Approach for Action Tracklet Generation

Abstract:Understanding video content is pivotal for advancing real-world applications like activity recognition, autonomous systems, and human-computer interaction. While scene graphs are adept at capturing spatial relationships between objects in individual frames, extending these representations to capture dynamic interactions across video sequences remains a significant challenge. To address this, we present TCDSG, Temporally Consistent Dynamic Scene Graphs, an innovative end-to-end framework that detects, tracks, and links subject-object relationships across time, generating action tracklets, temporally consistent sequences of entities and their interactions. Our approach leverages a novel bipartite matching mechanism, enhanced by adaptive decoder queries and feedback loops, ensuring temporal coherence and robust tracking over extended sequences. This method not only establishes a new benchmark by achieving over 60% improvement in temporal recall@k on the Action Genome, OpenPVSG, and MEVA datasets but also pioneers the augmentation of MEVA with persistent object ID annotations for comprehensive tracklet generation. By seamlessly integrating spatial and temporal dynamics, our work sets a new standard in multi-frame video analysis, opening new avenues for high-impact applications in surveillance, autonomous navigation, and beyond.

Via

Access Paper or Ask Questions

BLoad: Enhancing Neural Network Training with Efficient Sequential Data Handling

Oct 16, 2023

Raphael Ruschel, A. S. M. Iftekhar, B. S. Manjunath, Suya You

Figure 1 for BLoad: Enhancing Neural Network Training with Efficient Sequential Data Handling

Figure 2 for BLoad: Enhancing Neural Network Training with Efficient Sequential Data Handling

Figure 3 for BLoad: Enhancing Neural Network Training with Efficient Sequential Data Handling

Figure 4 for BLoad: Enhancing Neural Network Training with Efficient Sequential Data Handling

Abstract:The increasing complexity of modern deep neural network models and the expanding sizes of datasets necessitate the development of optimized and scalable training methods. In this white paper, we addressed the challenge of efficiently training neural network models using sequences of varying sizes. To address this challenge, we propose a novel training scheme that enables efficient distributed data-parallel training on sequences of different sizes with minimal overhead. By using this scheme we were able to reduce the padding amount by more than 100$x$ while not deleting a single frame, resulting in an overall increased performance on both training time and Recall in our experiments.

Via

Access Paper or Ask Questions

DDS: Decoupled Dynamic Scene-Graph Generation Network

Jan 18, 2023

A S M Iftekhar, Raphael Ruschel, Satish Kumar, Suya You, B. S. Manjunath

Abstract:Scene-graph generation involves creating a structural representation of the relationships between objects in a scene by predicting subject-object-relation triplets from input data. However, existing methods show poor performance in detecting triplets outside of a predefined set, primarily due to their reliance on dependent feature learning. To address this issue we propose DDS -- a decoupled dynamic scene-graph generation network -- that consists of two independent branches that can disentangle extracted features. The key innovation of the current paper is the decoupling of the features representing the relationships from those of the objects, which enables the detection of novel object-relationship combinations. The DDS model is evaluated on three datasets and outperforms previous methods by a significant margin, especially in detecting previously unseen triplets.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions