Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation

Aug 01, 2024

Xinhan Di, Zihao Chen, Yunming Liang, Junjie Zheng, Yihua Wang, Chaofan Ding

Figure 1 for Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation

Figure 2 for Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation

Figure 3 for Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation

Figure 4 for Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation

Share this with someone who'll enjoy it:

Abstract:Large-scale text-to-speech (TTS) models have made significant progress recently.However, they still fall short in the generation of Chinese dialectal speech. Toaddress this, we propose Bailing-TTS, a family of large-scale TTS models capable of generating high-quality Chinese dialectal speech. Bailing-TTS serves as a foundation model for Chinese dialectal speech generation. First, continual semi-supervised learning is proposed to facilitate the alignment of text tokens and speech tokens. Second, the Chinese dialectal representation learning is developed using a specific transformer architecture and multi-stage training processes. With the proposed design of novel network architecture and corresponding strategy, Bailing-TTS is able to generate Chinese dialectal speech from text effectively and efficiently. Experiments demonstrate that Bailing-TTS generates Chinese dialectal speech towards human-like spontaneous representation. Readers are encouraged to listen to demos at \url{https://c9412600.github.io/bltts_tech_report/index.html}.

* 8 pages, 2 figures

View paper on

Share this with someone who'll enjoy it:

Title:Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation

Paper and Code