Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:An analysis on the effects of speaker embedding choice in non auto-regressive TTS

Jul 19, 2023

Adriana Stan, Johannah O'Mahony

Figure 1 for An analysis on the effects of speaker embedding choice in non auto-regressive TTS

Figure 2 for An analysis on the effects of speaker embedding choice in non auto-regressive TTS

Figure 3 for An analysis on the effects of speaker embedding choice in non auto-regressive TTS

Figure 4 for An analysis on the effects of speaker embedding choice in non auto-regressive TTS

Share this with someone who'll enjoy it:

Abstract:In this paper we introduce a first attempt on understanding how a non-autoregressive factorised multi-speaker speech synthesis architecture exploits the information present in different speaker embedding sets. We analyse if jointly learning the representations, and initialising them from pretrained models determine any quality improvements for target speaker identities. In a separate analysis, we investigate how the different sets of embeddings impact the network's core speech abstraction (i.e. zero conditioned) in terms of speaker identity and representation learning. We show that, regardless of the used set of embeddings and learning strategy, the network can handle various speaker identities equally well, with barely noticeable variations in speech output quality, and that speaker leakage within the core structure of the synthesis system is inevitable in the standard training procedures adopted thus far.

* Accepted for publication at ISCA Speech Synthesis Workshop 2023

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:An analysis on the effects of speaker embedding choice in non auto-regressive TTS

Paper and Code