Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language

Dec 16, 2022

Yusuke Yasuda, Tomoki Toda

Figure 1 for Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language

Figure 2 for Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language

Figure 3 for Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language

Figure 4 for Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language

Share this with someone who'll enjoy it:

Abstract:End-to-end text-to-speech synthesis (TTS) can generate highly natural synthetic speech from raw text. However, rendering the correct pitch accents is still a challenging problem for end-to-end TTS. To tackle the challenge of rendering correct pitch accent in Japanese end-to-end TTS, we adopt PnG~BERT, a self-supervised pretrained model in the character and phoneme domain for TTS. We investigate the effects of features captured by PnG~BERT on Japanese TTS by modifying the fine-tuning condition to determine the conditions helpful inferring pitch accents. We manipulate content of PnG~BERT features from being text-oriented to speech-oriented by changing the number of fine-tuned layers during TTS. In addition, we teach PnG~BERT pitch accent information by fine-tuning with tone prediction as an additional downstream task. Our experimental results show that the features of PnG~BERT captured by pretraining contain information helpful inferring pitch accent, and PnG~BERT outperforms baseline Tacotron on accent correctness in a listening test.

* IEEE Journal of Selected Topics in Signal Processing (Volume: 16, Issue: 6, October 2022)

View paper on

Share this with someone who'll enjoy it:

Title:Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language

Paper and Code