Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Spectral oversubtraction? An approach for speech enhancement after robot ego speech filtering in semi-real-time

Sep 10, 2024

Yue Li, Koen V. Hindriks, Florian A. Kunneman

Figure 1 for Spectral oversubtraction? An approach for speech enhancement after robot ego speech filtering in semi-real-time

Figure 2 for Spectral oversubtraction? An approach for speech enhancement after robot ego speech filtering in semi-real-time

Figure 3 for Spectral oversubtraction? An approach for speech enhancement after robot ego speech filtering in semi-real-time

Share this with someone who'll enjoy it:

Abstract:Spectral subtraction, widely used for its simplicity, has been employed to address the Robot Ego Speech Filtering (RESF) problem for detecting speech contents of human interruption from robot's single-channel microphone recordings when it is speaking. However, this approach suffers from oversubtraction in the fundamental frequency range (FFR), leading to degraded speech content recognition. To address this, we propose a Two-Mask Conformer-based Metric Generative Adversarial Network (CMGAN) to enhance the detected speech and improve recognition results. Our model compensates for oversubtracted FFR values with high-frequency information and long-term features and then de-noises the new spectrogram. In addition, we introduce an incremental processing method that allows semi-real-time audio processing with streaming input on a network trained on long fixed-length input. Evaluations of two datasets, including one with unseen noise, demonstrate significant improvements in recognition accuracy and the effectiveness of the proposed two-mask approach and incremental processing, enhancing the robustness of the proposed RESF pipeline in real-world HRI scenarios.

* 6 pages, 2 figures, submitted to 2025 IEEE ICASSP

View paper on

Share this with someone who'll enjoy it:

Title:Spectral oversubtraction? An approach for speech enhancement after robot ego speech filtering in semi-real-time

Paper and Code