Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:LiGAR: LiDAR-Guided Hierarchical Transformer for Multi-Modal Group Activity Recognition

Oct 28, 2024

Naga Venkata Sai Raviteja Chappa, Khoa Luu

Figure 1 for LiGAR: LiDAR-Guided Hierarchical Transformer for Multi-Modal Group Activity Recognition

Figure 2 for LiGAR: LiDAR-Guided Hierarchical Transformer for Multi-Modal Group Activity Recognition

Figure 3 for LiGAR: LiDAR-Guided Hierarchical Transformer for Multi-Modal Group Activity Recognition

Figure 4 for LiGAR: LiDAR-Guided Hierarchical Transformer for Multi-Modal Group Activity Recognition

Share this with someone who'll enjoy it:

Abstract:Group Activity Recognition (GAR) remains challenging in computer vision due to the complex nature of multi-agent interactions. This paper introduces LiGAR, a LIDAR-Guided Hierarchical Transformer for Multi-Modal Group Activity Recognition. LiGAR leverages LiDAR data as a structural backbone to guide the processing of visual and textual information, enabling robust handling of occlusions and complex spatial arrangements. Our framework incorporates a Multi-Scale LIDAR Transformer, Cross-Modal Guided Attention, and an Adaptive Fusion Module to integrate multi-modal data at different semantic levels effectively. LiGAR's hierarchical architecture captures group activities at various granularities, from individual actions to scene-level dynamics. Extensive experiments on the JRDB-PAR, Volleyball, and NBA datasets demonstrate LiGAR's superior performance, achieving state-of-the-art results with improvements of up to 10.6% in F1-score on JRDB-PAR and 5.9% in Mean Per Class Accuracy on the NBA dataset. Notably, LiGAR maintains high performance even when LiDAR data is unavailable during inference, showcasing its adaptability. Our ablation studies highlight the significant contributions of each component and the effectiveness of our multi-modal, multi-scale approach in advancing the field of group activity recognition.

* 14 pages, 4 figures, 10 tables

View paper on

Share this with someone who'll enjoy it:

Title:LiGAR: LiDAR-Guided Hierarchical Transformer for Multi-Modal Group Activity Recognition

Paper and Code