Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR

Apr 03, 2021

Tsz Kin Lam, Mayumi Ohta, Shigehiko Schamoni, Stefan Riezler

Figure 1 for On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR

Figure 2 for On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR

Figure 3 for On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR

Figure 4 for On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR

Share this with someone who'll enjoy it:

Abstract:We propose an on-the-fly data augmentation method for automatic speech recognition (ASR) that uses alignment information to generate effective training samples. Our method, called Aligned Data Augmentation (ADA) for ASR, replaces transcribed tokens and the speech representations in an aligned manner to generate previously unseen training pairs. The speech representations are sampled from an audio dictionary that has been extracted from the training corpus and inject speaker variations into the training examples. The transcribed tokens are either predicted by a language model such that the augmented data pairs are semantically close to the original data, or randomly sampled. Both strategies result in training pairs that improve robustness in ASR training. Our experiments on a Seq-to-Seq architecture show that ADA can be applied on top of SpecAugment, and achieves about 9-23% and 4-15% relative improvements in WER over SpecAugment alone on LibriSpeech 100h and LibriSpeech 960h test datasets, respectively.

* Submitted to INTERSPEECH 2021

View paper on

Share this with someone who'll enjoy it:

Title:On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR

Paper and Code