Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Unsupervised Rhythm and Voice Conversion of Dysarthric to Healthy Speech for ASR

Jan 17, 2025

Karl El Hajal, Enno Hermann, Ajinkya Kulkarni, Mathew Magimai. -Doss

Share this with someone who'll enjoy it:

Abstract:Automatic speech recognition (ASR) systems are well known to perform poorly on dysarthric speech. Previous works have addressed this by speaking rate modification to reduce the mismatch with typical speech. Unfortunately, these approaches rely on transcribed speech data to estimate speaking rates and phoneme durations, which might not be available for unseen speakers. Therefore, we combine unsupervised rhythm and voice conversion methods based on self-supervised speech representations to map dysarthric to typical speech. We evaluate the outputs with a large ASR model pre-trained on healthy speech without further fine-tuning and find that the proposed rhythm conversion especially improves performance for speakers of the Torgo corpus with more severe cases of dysarthria. Code and audio samples are available at https://idiap.github.io/RnV .

* Accepted at ICASSP 2025 Satellite Workshop: Workshop on Speech Pathology Analysis and DEtection (SPADE)

View paper on

Share this with someone who'll enjoy it:

Title:Unsupervised Rhythm and Voice Conversion of Dysarthric to Healthy Speech for ASR

Paper and Code