Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Guided-TTS:Text-to-Speech with Untranscribed Speech

Dec 07, 2021

Heeseung Kim, Sungwon Kim, Sungroh Yoon

Figure 1 for Guided-TTS:Text-to-Speech with Untranscribed Speech

Figure 2 for Guided-TTS:Text-to-Speech with Untranscribed Speech

Figure 3 for Guided-TTS:Text-to-Speech with Untranscribed Speech

Figure 4 for Guided-TTS:Text-to-Speech with Untranscribed Speech

Share this with someone who'll enjoy it:

Abstract:Most neural text-to-speech (TTS) models require <speech, transcript> paired data from the desired speaker for high-quality speech synthesis, which limits the usage of large amounts of untranscribed data for training. In this work, we present Guided-TTS, a high-quality TTS model that learns to generate speech from untranscribed speech data. Guided-TTS combines an unconditional diffusion probabilistic model with a separately trained phoneme classifier for text-to-speech. By modeling the unconditional distribution for speech, our model can utilize the untranscribed data for training. For text-to-speech synthesis, we guide the generative process of the unconditional DDPM via phoneme classification to produce mel-spectrograms from the conditional distribution given transcript. We show that Guided-TTS achieves comparable performance with the existing methods without any transcript for LJSpeech. Our results further show that a single speaker-dependent phoneme classifier trained on multispeaker large-scale data can guide unconditional DDPMs for various speakers to perform TTS.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Guided-TTS:Text-to-Speech with Untranscribed Speech

Paper and Code