Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects

Aug 02, 2018

Hieu-Thi Luong, Xin Wang, Junichi Yamagishi, Nobuyuki Nishizawa

Figure 1 for Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects

Figure 2 for Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects

Figure 3 for Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects

Figure 4 for Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects

Share this with someone who'll enjoy it:

Abstract:We investigated the impact of noisy linguistic features on the performance of a Japanese speech synthesis system based on neural network that uses WaveNet vocoder. We compared an ideal system that uses manually corrected linguistic features including phoneme and prosodic information in training and test sets against a few other systems that use corrupted linguistic features. Both subjective and objective results demonstrate that corrupted linguistic features, especially those in the test set, affected the ideal system's performance significantly in a statistical sense due to a mismatched condition between the training and test sets. Interestingly, while an utterance-level Turing test showed that listeners had a difficult time differentiating synthetic speech from natural speech, it further indicated that adding noise to the linguistic features in the training set can partially reduce the effect of the mismatch, regularize the model, and help the system perform better when linguistic features of the test set are noisy.

* Accepted for Interspeech 2018

View paper on

Share this with someone who'll enjoy it:

Title:Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects

Paper and Code