Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DurFlex-EVC: Duration-Flexible Emotional Voice Conversion with Parallel Generation

Jan 16, 2024

Hyoung-Seok Oh, Sang-Hoon Lee, Deok-Hyun Cho, Seong-Whan Lee

Share this with someone who'll enjoy it:

Abstract:Emotional voice conversion (EVC) seeks to modify the emotional tone of a speaker's voice while preserving the original linguistic content and the speaker's unique vocal characteristics. Recent advancements in EVC have involved the simultaneous modeling of pitch and duration, utilizing the potential of sequence-to-sequence (seq2seq) models. To enhance reliability and efficiency in conversion, this study shifts focus towards parallel speech generation. We introduce Duration-Flexible EVC (DurFlex-EVC), which integrates a style autoencoder and unit aligner. Traditional models, while incorporating self-supervised learning (SSL) representations that contain both linguistic and paralinguistic information, have neglected this dual nature, leading to reduced controllability. Addressing this issue, we implement cross-attention to synchronize these representations with various emotions. Additionally, a style autoencoder is developed for the disentanglement and manipulation of style elements. The efficacy of our approach is validated through both subjective and objective evaluations, establishing its superiority over existing models in the field.

* 13 pages, 9 figures, 8 tables

View paper on

Share this with someone who'll enjoy it:

Title:DurFlex-EVC: Duration-Flexible Emotional Voice Conversion with Parallel Generation

Paper and Code