Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

R. J. J. H. van Son

Improving severity preservation of healthy-to-pathological voice conversion with global style tokens

Oct 04, 2023

Bence Mark Halpern, Wen-Chin Huang, Lester Phillip Violeta, R. J. J. H. van Son, Tomoki Toda

Abstract:In healthy-to-pathological voice conversion (H2P-VC), healthy speech is converted into pathological while preserving the identity. The paper improves on previous two-stage approach to H2P-VC where (1) speech is created first with the appropriate severity, (2) then the speaker identity of the voice is converted while preserving the severity of the voice. Specifically, we propose improvements to (2) by using phonetic posteriorgrams (PPG) and global style tokens (GST). Furthermore, we present a new dataset that contains parallel recordings of pathological and healthy speakers with the same identity which allows more precise evaluation. Listening tests by expert listeners show that the framework preserves severity of the source sample, while modelling target speaker's voice. We also show that (a) pathology impacts x-vectors but not all speaker information is lost, (b) choosing source speakers based on severity labels alone is insufficient.

* 7 pages, 3 figures, 5 tables. Accepted to IEEE Automatic Speech Recognition and Understanding Workshop 2023

Via

Access Paper or Ask Questions