Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders

Sep 03, 2024

Yiwei Guo, Zhihan Li, Junjie Li, Chenpeng Du, Hankun Wang, Shuai Wang, Xie Chen, Kai Yu

Figure 1 for vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders

Figure 2 for vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders

Figure 3 for vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders

Figure 4 for vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders

Share this with someone who'll enjoy it:

Abstract:We propose a new speech discrete token vocoder, vec2wav 2.0, which advances voice conversion (VC). We use discrete tokens from speech self-supervised models as the content features of source speech, and treat VC as a prompted vocoding task. To amend the loss of speaker timbre in the content tokens, vec2wav 2.0 utilizes the WavLM features to provide strong timbre-dependent information. A novel adaptive Snake activation function is proposed to better incorporate timbre into the waveform reconstruction process. In this way, vec2wav 2.0 learns to alter the speaker timbre appropriately given different reference prompts. Also, no supervised data is required for vec2wav 2.0 to be effectively trained. Experimental results demonstrate that vec2wav 2.0 outperforms all other baselines to a considerable margin in terms of audio quality and speaker similarity in any-to-any VC. Ablation studies verify the effects made by the proposed techniques. Moreover, vec2wav 2.0 achieves competitive cross-lingual VC even only trained on monolingual corpus. Thus, vec2wav 2.0 shows timbre can potentially be manipulated only by speech token vocoders, pushing the frontiers of VC and speech synthesis.

* 5 pages, 4 figures

View paper on

Share this with someone who'll enjoy it:

Title:vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders

Paper and Code