Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Luca Jiang-Tao Yu

Unfolding Target Detection with State Space Model

Oct 30, 2024

Luca Jiang-Tao Yu, Chenshu Wu

Figure 1 for Unfolding Target Detection with State Space Model

Figure 2 for Unfolding Target Detection with State Space Model

Figure 3 for Unfolding Target Detection with State Space Model

Figure 4 for Unfolding Target Detection with State Space Model

Abstract:Target detection is a fundamental task in radar sensing, serving as the precursor to any further processing for various applications. Numerous detection algorithms have been proposed. Classical methods based on signal processing, e.g., the most widely used CFAR, are challenging to tune and sensitive to environmental conditions. Deep learning-based methods can be more accurate and robust, yet usually lack interpretability and physical relevance. In this paper, we introduce a novel method that combines signal processing and deep learning by unfolding the CFAR detector with a state space model architecture. By reserving the CFAR pipeline yet turning its sophisticated configurations into trainable parameters, our method achieves high detection performance without manual parameter tuning, while preserving model interpretability. We implement a lightweight model of only 260K parameters and conduct real-world experiments for human target detection using FMCW radars. The results highlight the remarkable performance of the proposed method, outperforming CFAR and its variants by 10X in detection rate and false alarm rate. Our code is open-sourced here: https://github.com/aiot-lab/NeuroDet.

Via

Access Paper or Ask Questions

USpeech: Ultrasound-Enhanced Speech with Minimal Human Effort via Cross-Modal Synthesis

Oct 29, 2024

Luca Jiang-Tao Yu, Running Zhao, Sijie Ji, Edith C. H. Ngai, Chenshu Wu

Figure 1 for USpeech: Ultrasound-Enhanced Speech with Minimal Human Effort via Cross-Modal Synthesis

Figure 2 for USpeech: Ultrasound-Enhanced Speech with Minimal Human Effort via Cross-Modal Synthesis

Figure 3 for USpeech: Ultrasound-Enhanced Speech with Minimal Human Effort via Cross-Modal Synthesis

Figure 4 for USpeech: Ultrasound-Enhanced Speech with Minimal Human Effort via Cross-Modal Synthesis

Abstract:Speech enhancement is crucial in human-computer interaction, especially for ubiquitous devices. Ultrasound-based speech enhancement has emerged as an attractive choice because of its superior ubiquity and performance. However, inevitable interference from unexpected and unintended sources during audio-ultrasound data acquisition makes existing solutions rely heavily on human effort for data collection and processing. This leads to significant data scarcity that limits the full potential of ultrasound-based speech enhancement. To address this, we propose USpeech, a cross-modal ultrasound synthesis framework for speech enhancement with minimal human effort. At its core is a two-stage framework that establishes correspondence between visual and ultrasonic modalities by leveraging audible audio as a bridge. This approach overcomes challenges from the lack of paired video-ultrasound datasets and the inherent heterogeneity between video and ultrasound data. Our framework incorporates contrastive video-audio pre-training to project modalities into a shared semantic space and employs an audio-ultrasound encoder-decoder for ultrasound synthesis. We then present a speech enhancement network that enhances speech in the time-frequency domain and recovers the clean speech waveform via a neural vocoder. Comprehensive experiments show USpeech achieves remarkable performance using synthetic ultrasound data comparable to physical data, significantly outperforming state-of-the-art ultrasound-based speech enhancement baselines. USpeech is open-sourced at https://github.com/aiot-lab/USpeech/.

Via

Access Paper or Ask Questions