Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Latent linguistic embedding for cross-lingual text-to-speech and voice conversion

Oct 08, 2020

Hieu-Thi Luong, Junichi Yamagishi

Figure 1 for Latent linguistic embedding for cross-lingual text-to-speech and voice conversion

Figure 2 for Latent linguistic embedding for cross-lingual text-to-speech and voice conversion

Figure 3 for Latent linguistic embedding for cross-lingual text-to-speech and voice conversion

Figure 4 for Latent linguistic embedding for cross-lingual text-to-speech and voice conversion

Share this with someone who'll enjoy it:

Abstract:As the recently proposed voice cloning system, NAUTILUS, is capable of cloning unseen voices using untranscribed speech, we investigate the feasibility of using it to develop a unified cross-lingual TTS/VC system. Cross-lingual speech generation is the scenario in which speech utterances are generated with the voices of target speakers in a language not spoken by them originally. This type of system is not simply cloning the voice of the target speaker, but essentially creating a new voice that can be considered better than the original under a specific framing. By using a well-trained English latent linguistic embedding to create a cross-lingual TTS and VC system for several German, Finnish, and Mandarin speakers included in the Voice Conversion Challenge 2020, we show that our method not only creates cross-lingual VC with high speaker similarity but also can be seamlessly used for cross-lingual TTS without having to perform any extra steps. However, the subjective evaluations of perceived naturalness seemed to vary between target speakers, which is one aspect for future improvement.

* Accepted to Voice Conversion Challenge 2020 Online Workshop

View paper on

Share this with someone who'll enjoy it:

Title:Latent linguistic embedding for cross-lingual text-to-speech and voice conversion

Paper and Code