Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition

Mar 29, 2022

Junrui Ni, Liming Wang, Heting Gao, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson

Figure 1 for Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition

Figure 2 for Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition

Figure 3 for Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition

Figure 4 for Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition

Share this with someone who'll enjoy it:

Abstract:An unsupervised text-to-speech synthesis (TTS) system learns to generate the speech waveform corresponding to any written sentence in a language by observing: 1) a collection of untranscribed speech waveforms in that language; 2) a collection of texts written in that language without access to any transcribed speech. Developing such a system can significantly improve the availability of speech technology to languages without a large amount of parallel speech and text data. This paper proposes an unsupervised TTS system by leveraging recent advances in unsupervised automatic speech recognition (ASR). Our unsupervised system can achieve comparable performance to the supervised system in seven languages with about 10-20 hours of speech each. A careful study on the effect of text units and vocoders has also been conducted to better understand what factors may affect unsupervised TTS performance. The samples generated by our models can be found at https://cactuswiththoughts.github.io/UnsupTTS-Demo.

* submitted to INTERSPEECH

View paper on

Share this with someone who'll enjoy it:

Title:Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition

Paper and Code