Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaofeng Ge

Dynamic Acoustic Compensation and Adaptive Focal Training for Personalized Speech Enhancement

Nov 22, 2022

Xiaofeng Ge, Jiangyu Han, Haixin Guan, Yanhua Long

Figure 1 for Dynamic Acoustic Compensation and Adaptive Focal Training for Personalized Speech Enhancement

Figure 2 for Dynamic Acoustic Compensation and Adaptive Focal Training for Personalized Speech Enhancement

Figure 3 for Dynamic Acoustic Compensation and Adaptive Focal Training for Personalized Speech Enhancement

Figure 4 for Dynamic Acoustic Compensation and Adaptive Focal Training for Personalized Speech Enhancement

Abstract:Recently, more and more personalized speech enhancement systems (PSE) with excellent performance have been proposed. However, two critical issues still limit the performance and generalization ability of the model: 1) Acoustic environment mismatch between the test noisy speech and target speaker enrollment speech; 2) Hard sample mining and learning. In this paper, dynamic acoustic compensation (DAC) is proposed to alleviate the environment mismatch, by intercepting the noise or environmental acoustic segments from noisy speech and mixing it with the clean enrollment speech. To well exploit the hard samples in training data, we propose an adaptive focal training (AFT) strategy by assigning adaptive loss weights to hard and non-hard samples during training. A time-frequency multi-loss training is further introduced to improve and generalize our previous work sDPCCN for PSE. The effectiveness of proposed methods are examined on the DNS4 Challenge dataset. Results show that, the DAC brings large improvements in terms of multiple evaluation metrics, and AFT reduces the hard sample rate significantly and produces obvious MOS score improvement.

Via

Access Paper or Ask Questions

PercepNet+: A Phase and SNR Aware PercepNet for Real-Time Speech Enhancement

Mar 04, 2022

Xiaofeng Ge, Jiangyu Han, Yanhua Long, Haixin Guan

Figure 1 for PercepNet+: A Phase and SNR Aware PercepNet for Real-Time Speech Enhancement

Figure 2 for PercepNet+: A Phase and SNR Aware PercepNet for Real-Time Speech Enhancement

Figure 3 for PercepNet+: A Phase and SNR Aware PercepNet for Real-Time Speech Enhancement

Figure 4 for PercepNet+: A Phase and SNR Aware PercepNet for Real-Time Speech Enhancement

Abstract:PercepNet, a recent extension of the RNNoise, an efficient, high-quality and real-time full-band speech enhancement technique, has shown promising performance in various public deep noise suppression tasks. This paper proposes a new approach, named PercepNet+, to further extend the PercepNet with four significant improvements. First, we introduce a phase-aware structure to leverage the phase information into PercepNet, by adding the complex features and complex subband gains as the deep network input and output respectively. Then, a signal-to-noise ratio (SNR) estimator and an SNR switched post-processing are specially designed to alleviate the over attenuation (OA) that appears in high SNR conditions of the original PercepNet. Moreover, the GRU layer is replaced by TF-GRU to model both temporal and frequency dependencies. Finally, we propose to integrate the loss of complex subband gain, SNR, pitch filtering strength, and an OA loss in a multi-objective learning manner to further improve the speech enhancement performance. Experimental results show that, the proposed PercepNet+ outperforms the original PercepNet significantly in terms of both PESQ and STOI, without increasing the model size too much.

* This article was submitted to Interspeech 2022

Via

Access Paper or Ask Questions