Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Huanran Hu

TAME: Temporal Audio-based Mamba for Enhanced Drone Trajectory Estimation and Classification

Dec 17, 2024

Zhenyuan Xiao, Huanran Hu, Guili Xu, Junwei He

Figure 1 for TAME: Temporal Audio-based Mamba for Enhanced Drone Trajectory Estimation and Classification

Figure 2 for TAME: Temporal Audio-based Mamba for Enhanced Drone Trajectory Estimation and Classification

Figure 3 for TAME: Temporal Audio-based Mamba for Enhanced Drone Trajectory Estimation and Classification

Figure 4 for TAME: Temporal Audio-based Mamba for Enhanced Drone Trajectory Estimation and Classification

Abstract:The increasing prevalence of compact UAVs has introduced significant risks to public safety, while traditional drone detection systems are often bulky and costly. To address these challenges, we present TAME, the Temporal Audio-based Mamba for Enhanced Drone Trajectory Estimation and Classification. This innovative anti-UAV detection model leverages a parallel selective state-space model to simultaneously capture and learn both the temporal and spectral features of audio, effectively analyzing propagation of sound. To further enhance temporal features, we introduce a Temporal Feature Enhancement Module, which integrates spectral features into temporal data using residual cross-attention. This enhanced temporal information is then employed for precise 3D trajectory estimation and classification. Our model sets a new standard of performance on the MMUAD benchmarks, demonstrating superior accuracy and effectiveness. The code and trained models are publicly available on GitHub \url{https://github.com/AmazingDay1/TAME}.

Via

Access Paper or Ask Questions

MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

Aug 08, 2024

Yanqi Dai, Huanran Hu, Lei Wang, Shengjie Jin, Xu Chen, Zhiwu Lu

Figure 1 for MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

Figure 2 for MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

Figure 3 for MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

Figure 4 for MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

Abstract:Recently, Role-Playing Agents (RPAs) have garnered increasing attention for their potential to deliver emotional value and facilitate sociological research. However, existing studies are primarily confined to the textual modality, unable to simulate humans' multimodal perceptual capabilities. To bridge this gap, we introduce the concept of Multimodal Role-Playing Agents (MRPAs), and propose a comprehensive framework, MMRole, for their development and evaluation, which comprises a personalized multimodal dataset and a robust evaluation method. Specifically, we construct a large-scale, high-quality dataset, MMRole-Data, consisting of 85 characters, 11K images, and 14K single or multi-turn dialogues. Additionally, we present a robust evaluation method, MMRole-Eval, encompassing eight metrics across three dimensions, where a reward model is trained to score MRPAs with the constructed ground-truth data for comparison. Moreover, we develop the first specialized MRPA, MMRole-Agent. Extensive evaluation results demonstrate the improved performance of MMRole-Agent and highlight the primary challenges in developing MRPAs, emphasizing the need for enhanced multimodal understanding and role-playing consistency. The data, code, and models will be available at https://github.com/YanqiDai/MMRole.

Via

Access Paper or Ask Questions