Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chun-hsuan Wang

Towards Robust Neural Vocoding for Speech Generation: A Survey

Dec 05, 2019

Po-chun Hsu, Chun-hsuan Wang, Andy T. Liu, Hung-yi Lee

Figure 1 for Towards Robust Neural Vocoding for Speech Generation: A Survey

Figure 2 for Towards Robust Neural Vocoding for Speech Generation: A Survey

Figure 3 for Towards Robust Neural Vocoding for Speech Generation: A Survey

Figure 4 for Towards Robust Neural Vocoding for Speech Generation: A Survey

Abstract:Recently, neural vocoders have been widely used in speech synthesis tasks, including text-to-speech and voice conversion. However, in the encounter of data distribution mismatch between training and inference, neural vocoders trained on real data often degrade in voice quality for unseen scenarios. In this paper, we train three commonly used neural vocoders, including WaveNet, WaveRNN, and WaveGlow, alternately on five different datasets. To study the robustness of neural vocoders, we evaluate the models using acoustic features from seen/unseen speakers, seen/unseen languages, a text-to-speech model, and a voice conversion model. In this work, we found that WaveNet is more robust than WaveRNN, especially in the face of inconsistency between training and testing data. Through our experiments, we show that WaveNet is more suitable for text-to-speech models, and WaveRNN more suitable for voice conversion applications. Furthermore, we present results with considerable reference value of subjective human evaluation for future studies.

* Submitted to ICASSP 2020

Via

Access Paper or Ask Questions