Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis

Mar 26, 2022

Jinlong Xue, Yayue Deng, Yichen Han, Ya Li, Jianqing Sun, Jiaen Liang

Figure 1 for ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis

Figure 2 for ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis

Figure 3 for ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis

Figure 4 for ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis

Share this with someone who'll enjoy it:

Abstract:In recent years, neural network based methods for multi-speaker text-to-speech synthesis (TTS) have made significant progress. However, the current speaker encoder models used in these methods still cannot capture enough speaker information. In this paper, we focus on accurate speaker encoder modeling and propose an end-to-end method that can generate high-quality speech and better similarity for both seen and unseen speakers. The proposed architecture consists of three separately trained components: a speaker encoder based on the state-of-the-art ECAPA-TDNN model which is derived from speaker verification task, a FastSpeech2 based synthesizer, and a HiFi-GAN vocoder. The comparison among different speaker encoder models shows our proposed method can achieve better naturalness and similarity. To efficiently evaluate our synthesized speech, we are the first to adopt deep learning based automatic MOS evaluation methods to assess our results, and these methods show great potential in automatic speech quality assessment.

* 5 pages, 2 figures, submitted to interspeech2022

View paper on

Share this with someone who'll enjoy it:

Title:ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis

Paper and Code