Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

Aug 26, 2022

Shrutina Agarwal, Sriram Ganapathy, Naoya Takahashi

Figure 1 for Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

Figure 2 for Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

Figure 3 for Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

Figure 4 for Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

Share this with someone who'll enjoy it:

Abstract:In this paper, we propose a model to perform style transfer of speech to singing voice. Contrary to the previous signal processing-based methods, which require high-quality singing templates or phoneme synchronization, we explore a data-driven approach for the problem of converting natural speech to singing voice. We develop a novel neural network architecture, called SymNet, which models the alignment of the input speech with the target melody while preserving the speaker identity and naturalness. The proposed SymNet model is comprised of symmetrical stack of three types of layers - convolutional, transformer, and self-attention layers. The paper also explores novel data augmentation and generative loss annealing methods to facilitate the model training. Experiments are performed on the NUS and NHSS datasets which consist of parallel data of speech and singing voice. In these experiments, we show that the proposed SymNet model improves the objective reconstruction quality significantly over the previously published methods and baseline architectures. Further, a subjective listening test confirms the improved quality of the audio obtained using the proposed approach (absolute improvement of 0.37 in mean opinion score measure over the baseline system).

* accepted to INTERSPEECH 2022

View paper on

Share this with someone who'll enjoy it:

Title:Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

Paper and Code