Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model

Jun 25, 2024

Joun Yeop Lee, Myeonghun Jeong, Minchan Kim, Ji-Hyun Lee, Hoon-Young Cho, Nam Soo Kim

Figure 1 for High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model

Figure 2 for High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model

Figure 3 for High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model

Figure 4 for High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model

Share this with someone who'll enjoy it:

Abstract:We propose a novel two-stage text-to-speech (TTS) framework with two types of discrete tokens, i.e., semantic and acoustic tokens, for high-fidelity speech synthesis. It features two core components: the Interpreting module, which processes text and a speech prompt into semantic tokens focusing on linguistic contents and alignment, and the Speaking module, which captures the timbre of the target voice to generate acoustic tokens from semantic tokens, enriching speech reconstruction. The Interpreting stage employs a transducer for its robustness in aligning text to speech. In contrast, the Speaking stage utilizes a Conformer-based architecture integrated with a Grouped Masked Language Model (G-MLM) to boost computational efficiency. Our experiments verify that this innovative structure surpasses the conventional models in the zero-shot scenario in terms of speech quality and speaker similarity.

* Accepted by Interspeech2024

View paper on

Share this with someone who'll enjoy it:

Title:High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model

Paper and Code