Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers

Jul 02, 2022

Liumeng Xue, Shan Yang, Na Hu, Dan Su, Lei Xie

Figure 1 for Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers

Figure 2 for Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers

Figure 3 for Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers

Figure 4 for Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers

Share this with someone who'll enjoy it:

Abstract:Building a voice conversion system for noisy target speakers, such as users providing noisy samples or Internet found data, is a challenging task since the use of contaminated speech in model training will apparently degrade the conversion performance. In this paper, we leverage the advances of our recently proposed Glow-WaveGAN and propose a noise-independent speech representation learning approach for high-quality voice conversion for noisy target speakers. Specifically, we learn a latent feature space where we ensure that the target distribution modeled by the conversion model is exactly from the modeled distribution of the waveform generator. With this premise, we further manage to make the latent feature to be noise-invariant. Specifically, we introduce a noise-controllable WaveGAN, which directly learns the noise-independent acoustic representation from waveform by the encoder and conducts noise control in the hidden space through a FiLM module in the decoder. As for the conversion model, importantly, we use a flow-based model to learn the distribution of noise-independent but speaker-related latent features from phoneme posteriorgrams. Experimental results demonstrate that the proposed model achieves high speech quality and speaker similarity in the voice conversion for noisy target speakers.

* Accepted by INTERSPEECH 2022

View paper on

Share this with someone who'll enjoy it:

Title:Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers

Paper and Code