Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yin-Ping Cho

Mandarin Singing Voice Synthesis with Denoising Diffusion Probabilistic Wasserstein GAN

Sep 21, 2022

Yin-Ping Cho, Yu Tsao, Hsin-Min Wang, Yi-Wen Liu

Figure 1 for Mandarin Singing Voice Synthesis with Denoising Diffusion Probabilistic Wasserstein GAN

Figure 2 for Mandarin Singing Voice Synthesis with Denoising Diffusion Probabilistic Wasserstein GAN

Figure 3 for Mandarin Singing Voice Synthesis with Denoising Diffusion Probabilistic Wasserstein GAN

Figure 4 for Mandarin Singing Voice Synthesis with Denoising Diffusion Probabilistic Wasserstein GAN

Abstract:Singing voice synthesis (SVS) is the computer production of a human-like singing voice from given musical scores. To accomplish end-to-end SVS effectively and efficiently, this work adopts the acoustic model-neural vocoder architecture established for high-quality speech and singing voice synthesis. Specifically, this work aims to pursue a higher level of expressiveness in synthesized voices by combining the diffusion denoising probabilistic model (DDPM) and \emph{Wasserstein} generative adversarial network (WGAN) to construct the backbone of the acoustic model. On top of the proposed acoustic model, a HiFi-GAN neural vocoder is adopted with integrated fine-tuning to ensure optimal synthesis quality for the resulting end-to-end SVS system. This end-to-end system was evaluated with the multi-singer Mpop600 Mandarin singing voice dataset. In the experiments, the proposed system exhibits improvements over previous landmark counterparts in terms of musical expressiveness and high-frequency acoustic details. Moreover, the adversarial acoustic model converged stably without the need to enforce reconstruction objectives, indicating the convergence stability of the proposed DDPM and WGAN combined architecture over alternative GAN-based SVS systems.

Via

Access Paper or Ask Questions

A Survey on Recent Deep Learning-driven Singing Voice Synthesis Systems

Oct 06, 2021

Yin-Ping Cho, Fu-Rong Yang, Yung-Chuan Chang, Ching-Ting Cheng, Xiao-Han Wang, Yi-Wen Liu

Figure 1 for A Survey on Recent Deep Learning-driven Singing Voice Synthesis Systems

Figure 2 for A Survey on Recent Deep Learning-driven Singing Voice Synthesis Systems

Abstract:Singing voice synthesis (SVS) is a task that aims to generate audio signals according to musical scores and lyrics. With its multifaceted nature concerning music and language, producing singing voices indistinguishable from that of human singers has always remained an unfulfilled pursuit. Nonetheless, the advancements of deep learning techniques have brought about a substantial leap in the quality and naturalness of synthesized singing voice. This paper aims to review some of the state-of-the-art deep learning-driven SVS systems. We intend to summarize their deployed model architectures and identify the strengths and limitations for each of the introduced systems. Thereby, we picture the recent advancement trajectory of this field and conclude the challenges left to be resolved both in commercial applications and academic research.

Via

Access Paper or Ask Questions