Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks

Dec 02, 2020

Felix Grezes, Zhaoheng Ni, Viet Anh Trinh, Michael Mandel

Figure 1 for Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks

Figure 2 for Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks

Figure 3 for Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks

Figure 4 for Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks

Share this with someone who'll enjoy it:

Abstract:Recent works have shown that Deep Recurrent Neural Networks using the LSTM architecture can achieve strong single-channel speech enhancement by estimating time-frequency masks. However, these models do not naturally generalize to multi-channel inputs from varying microphone configurations. In contrast, spatial clustering techniques can achieve such generalization but lack a strong signal model. Our work proposes a combination of the two approaches. By using LSTMs to enhance spatial clustering based time-frequency masks, we achieve both the signal modeling performance of multiple single-channel LSTM-DNN speech enhancers and the signal separation performance and generality of multi-channel spatial clustering. We compare our proposed system to several baselines on the CHiME-3 dataset. We evaluate the quality of the audio from each system using SDR from the BSS\_eval toolkit and PESQ. We evaluate the intelligibility of the output of each system using word error rate from a Kaldi automatic speech recognizer.

View paper on

Share this with someone who'll enjoy it:

Title:Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks

Paper and Code