Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Adapting TTS models For New Speakers using Transfer Learning

Oct 12, 2021

Paarth Neekhara, Jason Li, Boris Ginsburg

Figure 1 for Adapting TTS models For New Speakers using Transfer Learning

Figure 2 for Adapting TTS models For New Speakers using Transfer Learning

Share this with someone who'll enjoy it:

Abstract:Training neural text-to-speech (TTS) models for a new speaker typically requires several hours of high quality speech data. Prior works on voice cloning attempt to address this challenge by adapting pre-trained multi-speaker TTS models for a new voice, using a few minutes of speech data of the new speaker. However, publicly available large multi-speaker datasets are often noisy, thereby resulting in TTS models that are not suitable for use in products. We address this challenge by proposing transfer-learning guidelines for adapting high quality single-speaker TTS models for a new speaker, using only a few minutes of speech data. We conduct an extensive study using different amounts of data for a new speaker and evaluate the synthesized speech in terms of naturalness and voice/style similarity to the target speaker. We find that fine-tuning a single-speaker TTS model on just 30 minutes of data, can yield comparable performance to a model trained from scratch on more than 27 hours of data for both male and female target speakers.

* Submitted to ICASSP 2022

View paper on

Share this with someone who'll enjoy it:

Title:Adapting TTS models For New Speakers using Transfer Learning

Paper and Code