Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition

Jun 06, 2024

Sreyan Ghosh, Sonal Kumar, Ashish Seth, Purva Chiniya, Utkarsh Tyagi, Ramani Duraiswami, Dinesh Manocha

Figure 1 for LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition

Figure 2 for LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition

Figure 3 for LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition

Figure 4 for LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition

Share this with someone who'll enjoy it:

Abstract:Visual cues, like lip motion, have been shown to improve the performance of Automatic Speech Recognition (ASR) systems in noisy environments. We propose LipGER (Lip Motion aided Generative Error Correction), a novel framework for leveraging visual cues for noise-robust ASR. Instead of learning the cross-modal correlation between the audio and visual modalities, we make an LLM learn the task of visually-conditioned (generative) ASR error correction. Specifically, we instruct an LLM to predict the transcription from the N-best hypotheses generated using ASR beam-search. This is further conditioned on lip motions. This approach addresses key challenges in traditional AVSR learning, such as the lack of large-scale paired datasets and difficulties in adapting to new domains. We experiment on 4 datasets in various settings and show that LipGER improves the Word Error Rate in the range of 1.1%-49.2%. We also release LipHyp, a large-scale dataset with hypothesis-transcription pairs that is additionally equipped with lip motion cues to promote further research in this space

* InterSpeech 2024. Code and Data: https://github.com/Sreyan88/LipGER

View paper on

Share this with someone who'll enjoy it:

Title:LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition

Paper and Code