Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Adrien Dufraux

Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition

Oct 16, 2019

Adrien Dufraux, Emmanuel Vincent, Awni Hannun, Armelle Brun, Matthijs Douze

Figure 1 for Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition

Figure 2 for Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition

Figure 3 for Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition

Figure 4 for Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition

Abstract:The transcriptions used to train an Automatic Speech Recognition (ASR) system may contain errors. Usually, either a quality control stage discards transcriptions with too many errors, or the noisy transcriptions are used as is. We introduce Lead2Gold, a method to train an ASR system that exploits the full potential of noisy transcriptions. Based on a noise model of transcription errors, Lead2Gold searches for better transcriptions of the training data with a beam search that takes this noise model into account. The beam search is differentiable and does not require a forced alignment step, thus the whole system is trained end-to-end. Lead2Gold can be viewed as a new loss function that can be used on top of any sequence-to-sequence deep neural network. We conduct proof-of-concept experiments on noisy transcriptions generated from letter corruptions with different noise levels. We show that Lead2Gold obtains a better ASR accuracy than a competitive baseline which does not account for the (artificially-introduced) transcription noise.

* 8 pages, 4 tables, Accepted for publication in ASRU 2019

Via

Access Paper or Ask Questions