Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xianglong Zeng

AV-DTEC: Self-Supervised Audio-Visual Fusion for Drone Trajectory Estimation and Classification

Dec 22, 2024

Zhenyuan Xiao, Yizhuo Yang, Guili Xu, Xianglong Zeng, Shenghai Yuan

Figure 1 for AV-DTEC: Self-Supervised Audio-Visual Fusion for Drone Trajectory Estimation and Classification

Figure 2 for AV-DTEC: Self-Supervised Audio-Visual Fusion for Drone Trajectory Estimation and Classification

Figure 3 for AV-DTEC: Self-Supervised Audio-Visual Fusion for Drone Trajectory Estimation and Classification

Figure 4 for AV-DTEC: Self-Supervised Audio-Visual Fusion for Drone Trajectory Estimation and Classification

Abstract:The increasing use of compact UAVs has created significant threats to public safety, while traditional drone detection systems are often bulky and costly. To address these challenges, we propose AV-DTEC, a lightweight self-supervised audio-visual fusion-based anti-UAV system. AV-DTEC is trained using self-supervised learning with labels generated by LiDAR, and it simultaneously learns audio and visual features through a parallel selective state-space model. With the learned features, a specially designed plug-and-play primary-auxiliary feature enhancement module integrates visual features into audio features for better robustness in cross-lighting conditions. To reduce reliance on auxiliary features and align modalities, we propose a teacher-student model that adaptively adjusts the weighting of visual features. AV-DTEC demonstrates exceptional accuracy and effectiveness in real-world multi-modality data. The code and trained models are publicly accessible on GitHub \url{https://github.com/AmazingDay1/AV-DETC}.

* Submitted to ICRA 2025

Via

Access Paper or Ask Questions