Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yingjie Song

Singing Voice Synthesis with Vibrato Modeling and Latent Energy Representation

Nov 02, 2022

Yingjie Song, Wei Song, Wei Zhang, Zhengchen Zhang, Dan Zeng, Zhi Liu, Yang Yu

Figure 1 for Singing Voice Synthesis with Vibrato Modeling and Latent Energy Representation

Figure 2 for Singing Voice Synthesis with Vibrato Modeling and Latent Energy Representation

Figure 3 for Singing Voice Synthesis with Vibrato Modeling and Latent Energy Representation

Figure 4 for Singing Voice Synthesis with Vibrato Modeling and Latent Energy Representation

Abstract:This paper proposes an expressive singing voice synthesis system by introducing explicit vibrato modeling and latent energy representation. Vibrato is essential to the naturalness of synthesized sound, due to the inherent characteristics of human singing. Hence, a deep learning-based vibrato model is introduced in this paper to control the vibrato's likeliness, rate, depth and phase in singing, where the vibrato likeliness represents the existence probability of vibrato and it would help improve the singing voice's naturalness. Actually, there is no annotated label about vibrato likeliness in existing singing corpus. We adopt a novel vibrato likeliness labeling method to label the vibrato likeliness automatically. Meanwhile, the power spectrogram of audio contains rich information that can improve the expressiveness of singing. An autoencoder-based latent energy bottleneck feature is proposed for expressive singing voice synthesis. Experimental results on the open dataset NUS48E show that both the vibrato modeling and the latent energy representation could significantly improve the expressiveness of singing voice. The audio samples are shown in the demo website.

Via

Access Paper or Ask Questions