Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andrew Storus

Handling Background Noise in Neural Speech Generation

Feb 23, 2021

Tom Denton, Alejandro Luebs, Felicia S. C. Lim, Andrew Storus, Hengchin Yeh, W. Bastiaan Kleijn, Jan Skoglund

Figure 1 for Handling Background Noise in Neural Speech Generation

Figure 2 for Handling Background Noise in Neural Speech Generation

Figure 3 for Handling Background Noise in Neural Speech Generation

Figure 4 for Handling Background Noise in Neural Speech Generation

Abstract:Recent advances in neural-network based generative modeling of speech has shown great potential for speech coding. However, the performance of such models drops when the input is not clean speech, e.g., in the presence of background noise, preventing its use in practical applications. In this paper we examine the reason and discuss methods to overcome this issue. Placing a denoising preprocessing stage when extracting features and target clean speech during training is shown to be the best performing strategy.

* 5 pages, 3 figures, presented at the Asilomar Conference on Signals, Systems, and Computers 2020

Via

Access Paper or Ask Questions

Generative Speech Coding with Predictive Variance Regularization

Feb 18, 2021

W. Bastiaan Kleijn, Andrew Storus, Michael Chinen, Tom Denton, Felicia S. C. Lim, Alejandro Luebs, Jan Skoglund, Hengchin Yeh

Figure 1 for Generative Speech Coding with Predictive Variance Regularization

Figure 2 for Generative Speech Coding with Predictive Variance Regularization

Figure 3 for Generative Speech Coding with Predictive Variance Regularization

Abstract:The recent emergence of machine-learning based generative models for speech suggests a significant reduction in bit rate for speech codecs is possible. However, the performance of generative models deteriorates significantly with the distortions present in real-world input signals. We argue that this deterioration is due to the sensitivity of the maximum likelihood criterion to outliers and the ineffectiveness of modeling a sum of independent signals with a single autoregressive model. We introduce predictive-variance regularization to reduce the sensitivity to outliers, resulting in a significant increase in performance. We show that noise reduction to remove unwanted signals can significantly increase performance. We provide extensive subjective performance evaluations that show that our system based on generative modeling provides state-of-the-art coding performance at 3 kb/s for real-world speech signals at reasonable computational complexity.

Via

Access Paper or Ask Questions