Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sanha Lee

A Novel Deep Learning Framework for Efficient Multichannel Acoustic Feedback Control

May 21, 2025

Yuan-Kuei Wu, Juan Azcarreta, Kashyap Patel, Buye Xu, Jung-Suk Lee, Sanha Lee, Ashutosh Pandey

Figure 1 for A Novel Deep Learning Framework for Efficient Multichannel Acoustic Feedback Control

Figure 2 for A Novel Deep Learning Framework for Efficient Multichannel Acoustic Feedback Control

Figure 3 for A Novel Deep Learning Framework for Efficient Multichannel Acoustic Feedback Control

Abstract:This study presents a deep-learning framework for controlling multichannel acoustic feedback in audio devices. Traditional digital signal processing methods struggle with convergence when dealing with highly correlated noise such as feedback. We introduce a Convolutional Recurrent Network that efficiently combines spatial and temporal processing, significantly enhancing speech enhancement capabilities with lower computational demands. Our approach utilizes three training methods: In-a-Loop Training, Teacher Forcing, and a Hybrid strategy with a Multichannel Wiener Filter, optimizing performance in complex acoustic environments. This scalable framework offers a robust solution for real-world applications, making significant advances in Acoustic Feedback Control technology.

* Accepted by Interspeech 2025

Via

Access Paper or Ask Questions

All Neural Low-latency Directional Speech Extraction

Jul 05, 2024

Ashutosh Pandey, Sanha Lee, Juan Azcarreta, Daniel Wong, Buye Xu

Figure 1 for All Neural Low-latency Directional Speech Extraction

Figure 2 for All Neural Low-latency Directional Speech Extraction

Figure 3 for All Neural Low-latency Directional Speech Extraction

Figure 4 for All Neural Low-latency Directional Speech Extraction

Abstract:We introduce a novel all neural model for low-latency directional speech extraction. The model uses direction of arrival (DOA) embeddings from a predefined spatial grid, which are transformed and fused into a recurrent neural network based speech extraction model. This process enables the model to effectively extract speech from a specified DOA. Unlike previous methods that relied on hand-crafted directional features, the proposed model trains DOA embeddings from scratch using speech enhancement loss, making it suitable for low-latency scenarios. Additionally, it operates at a high frame rate, taking in DOA with each input frame, which brings in the capability of quickly adapting to changing scene in highly dynamic real-world scenarios. We provide extensive evaluation to demonstrate the model's efficacy in directional speech extraction, robustness to DOA mismatch, and its capability to quickly adapt to abrupt changes in DOA.

* Accepted for publication at INTERSPEECH 2024

Via

Access Paper or Ask Questions