Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Can we use Common Voice to train a Multi-Speaker TTS system?

Oct 12, 2022

Sewade Ogun, Vincent Colotte, Emmanuel Vincent

Figure 1 for Can we use Common Voice to train a Multi-Speaker TTS system?

Figure 2 for Can we use Common Voice to train a Multi-Speaker TTS system?

Figure 3 for Can we use Common Voice to train a Multi-Speaker TTS system?

Figure 4 for Can we use Common Voice to train a Multi-Speaker TTS system?

Share this with someone who'll enjoy it:

Abstract:Training of multi-speaker text-to-speech (TTS) systems relies on curated datasets based on high-quality recordings or audiobooks. Such datasets often lack speaker diversity and are expensive to collect. As an alternative, recent studies have leveraged the availability of large, crowdsourced automatic speech recognition (ASR) datasets. A major problem with such datasets is the presence of noisy and/or distorted samples, which degrade TTS quality. In this paper, we propose to automatically select high-quality training samples using a non-intrusive mean opinion score (MOS) estimator, WV-MOS. We show the viability of this approach for training a multi-speaker GlowTTS model on the Common Voice English dataset. Our approach improves the overall quality of generated utterances by 1.26 MOS point with respect to training on all the samples and by 0.35 MOS point with respect to training on the LibriTTS dataset. This opens the door to automatic TTS dataset curation for a wider range of languages.

* To appear in Proc. SLT 2022, Jan 09-12, 2023, Doha, Qatar

View paper on

Share this with someone who'll enjoy it:

Title:Can we use Common Voice to train a Multi-Speaker TTS system?

Paper and Code