Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures

Jul 25, 2024

Nick Rossenbach, Benedikt Hilmes, Ralf Schlüter

Share this with someone who'll enjoy it:

Abstract:In this work we evaluate the utility of synthetic data for training automatic speech recognition (ASR). We use the ASR training data to train a text-to-speech (TTS) system similar to FastSpeech-2. With this TTS we reproduce the original training data, training ASR systems solely on synthetic data. For ASR, we use three different architectures, attention-based encoder-decoder, hybrid deep neural network hidden Markov model and a Gaussian mixture hidden Markov model, showing the different sensitivity of the models to synthetic data generation. In order to extend previous work, we present a number of ablation studies on the effectiveness of synthetic vs. real training data for ASR. In particular we focus on how the gap between training on synthetic and real data changes by varying the speaker embedding or by scaling the model size. For the latter we show that the TTS models generalize well, even when training scores indicate overfitting.

* Accepted at the SynData4GenAI 2024 workshop

View paper on

Share this with someone who'll enjoy it:

Title:On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures

Paper and Code