Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis

Jun 04, 2024

Kun Zhou, Shengkui Zhao, Yukun Ma, Chong Zhang, Hao Wang, Dianwen Ng, Chongjia Ni, Nguyen Trung Hieu, Jia Qi Yip, Bin Ma

Figure 1 for Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis

Figure 2 for Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis

Figure 3 for Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis

Share this with someone who'll enjoy it:

Abstract:Recent language model-based text-to-speech (TTS) frameworks demonstrate scalability and in-context learning capabilities. However, they suffer from robustness issues due to the accumulation of errors in speech unit predictions during autoregressive language modeling. In this paper, we propose a phonetic enhanced language modeling method to improve the performance of TTS models. We leverage self-supervised representations that are phonetically rich as the training target for the autoregressive language model. Subsequently, a non-autoregressive model is employed to predict discrete acoustic codecs that contain fine-grained acoustic details. The TTS model focuses solely on linguistic modeling during autoregressive training, thereby reducing the error propagation that occurs in non-autoregressive training. Both objective and subjective evaluations validate the effectiveness of our proposed method.

* Accepted by Interspeech 2024

View paper on

Share this with someone who'll enjoy it:

Title:Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis

Paper and Code