Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Rep2wav: Noise Robust text-to-speech Using self-supervised representations

Sep 04, 2023

Qiushi Zhu, Yu Gu, Rilin Chen, Chao Weng, Yuchen Hu, Lirong Dai, Jie Zhang

Figure 1 for Rep2wav: Noise Robust text-to-speech Using self-supervised representations

Figure 2 for Rep2wav: Noise Robust text-to-speech Using self-supervised representations

Figure 3 for Rep2wav: Noise Robust text-to-speech Using self-supervised representations

Figure 4 for Rep2wav: Noise Robust text-to-speech Using self-supervised representations

Share this with someone who'll enjoy it:

Abstract:Benefiting from the development of deep learning, text-to-speech (TTS) techniques using clean speech have achieved significant performance improvements. The data collected from real scenes often contains noise and generally needs to be denoised by speech enhancement models. Noise-robust TTS models are often trained using the enhanced speech, which thus suffer from speech distortion and background noise that affect the quality of the synthesized speech. Meanwhile, it was shown that self-supervised pre-trained models exhibit excellent noise robustness on many speech tasks, implying that the learned representation has a better tolerance for noise perturbations. In this work, we therefore explore pre-trained models to improve the noise robustness of TTS models. Based on HiFi-GAN, we first propose a representation-to-waveform vocoder, which aims to learn to map the representation of pre-trained models to the waveform. We then propose a text-to-representation FastSpeech2 model, which aims to learn to map text to pre-trained model representations. Experimental results on the LJSpeech and LibriTTS datasets show that our method outperforms those using speech enhancement methods in both subjective and objective metrics. Audio samples are available at: https://zqs01.github.io/rep2wav.

* 5 pages,2 figures

View paper on

Share this with someone who'll enjoy it:

Title:Rep2wav: Noise Robust text-to-speech Using self-supervised representations

Paper and Code