Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training

Jan 20, 2022

J. Yang, Lei He

Figure 1 for Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training

Figure 2 for Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training

Figure 3 for Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training

Figure 4 for Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training

Share this with someone who'll enjoy it:

Abstract:In cross-lingual speech synthesis, the speech in various languages can be synthesized for a monoglot speaker. Normally, only the data of monoglot speakers are available for model training, thus the speaker similarity is relatively low between the synthesized cross-lingual speech and the native language recordings. Based on the multilingual transformer text-to-speech model, this paper studies a multi-task learning framework to improve the cross-lingual speaker similarity. To further improve the speaker similarity, joint training with a speaker classifier is proposed. Here, a scheme similar to parallel scheduled sampling is proposed to train the transformer model efficiently to avoid breaking the parallel training mechanism when introducing joint training. By using multi-task learning and speaker classifier joint training, in subjective and objective evaluations, the cross-lingual speaker similarity can be consistently improved for both the seen and unseen speakers in the training set.

View paper on

Share this with someone who'll enjoy it:

Title:Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training

Paper and Code