Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Victoria Chovaz

Improving fairness for spoken language understanding in atypical speech with Text-to-Speech

Nov 16, 2023

Helin Wang, Venkatesh Ravichandran, Milind Rao, Becky Lammers, Myra Sydnor, Nicholas Maragakis, Ankur A. Butala, Jayne Zhang, Lora Clawson, Victoria Chovaz(+1 more)

Figure 1 for Improving fairness for spoken language understanding in atypical speech with Text-to-Speech

Figure 2 for Improving fairness for spoken language understanding in atypical speech with Text-to-Speech

Figure 3 for Improving fairness for spoken language understanding in atypical speech with Text-to-Speech

Figure 4 for Improving fairness for spoken language understanding in atypical speech with Text-to-Speech

Abstract:Spoken language understanding (SLU) systems often exhibit suboptimal performance in processing atypical speech, typically caused by neurological conditions and motor impairments. Recent advancements in Text-to-Speech (TTS) synthesis-based augmentation for more fair SLU have struggled to accurately capture the unique vocal characteristics of atypical speakers, largely due to insufficient data. To address this issue, we present a novel data augmentation method for atypical speakers by finetuning a TTS model, called Aty-TTS. Aty-TTS models speaker and atypical characteristics via knowledge transferring from a voice conversion model. Then, we use the augmented data to train SLU models adapted to atypical speech. To train these data augmentation models and evaluate the resulting SLU systems, we have collected a new atypical speech dataset containing intent annotation. Both objective and subjective assessments validate that Aty-TTS is capable of generating high-quality atypical speech. Furthermore, it serves as an effective data augmentation strategy, contributing to more fair SLU systems that can better accommodate individuals with atypical speech patterns.

* Accepted at SyntheticData4ML 2023 Oral

Via

Access Paper or Ask Questions