Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Crossmodal ASR Error Correction with Discrete Speech Units

May 26, 2024

Yuanchao Li, Pinzhen Chen, Peter Bell, Catherine Lai

Figure 1 for Crossmodal ASR Error Correction with Discrete Speech Units

Figure 2 for Crossmodal ASR Error Correction with Discrete Speech Units

Figure 3 for Crossmodal ASR Error Correction with Discrete Speech Units

Figure 4 for Crossmodal ASR Error Correction with Discrete Speech Units

Share this with someone who'll enjoy it:

Abstract:ASR remains unsatisfactory in scenarios where the speaking style diverges from that used to train ASR systems, resulting in erroneous transcripts. To address this, ASR Error Correction (AEC), a post-ASR processing approach, is required. In this work, we tackle an understudied issue: the Low-Resource Out-of-Domain (LROOD) problem, by investigating crossmodal AEC on very limited downstream data with 1-best hypothesis transcription. We explore pre-training and fine-tuning strategies and uncover an ASR domain discrepancy phenomenon, shedding light on appropriate training schemes for LROOD data. Moreover, we propose the incorporation of discrete speech units to align with and enhance the word embeddings for improving AEC quality. Results from multiple corpora and several evaluation metrics demonstrate the feasibility and efficacy of our proposed AEC approach on LROOD data, as well as its generalizability and superiority on large-scale data. Finally, a study on speech emotion recognition confirms that our model produces ASR error-robust transcripts suitable for downstream applications.

View paper on

Share this with someone who'll enjoy it:

Title:Crossmodal ASR Error Correction with Discrete Speech Units

Paper and Code