Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper

Sep 18, 2024

Jiaming Zhou, Shiwan Zhao, Jiabei He, Hui Wang, Wenjia Zeng, Yong Chen, Haoqin Sun, Aobo Kong, Yong Qin

Figure 1 for M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper

Figure 2 for M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper

Figure 3 for M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper

Figure 4 for M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper

Share this with someone who'll enjoy it:

Abstract:State-of-the-art models like OpenAI's Whisper exhibit strong performance in multilingual automatic speech recognition (ASR), but they still face challenges in accurately recognizing diverse subdialects. In this paper, we propose M2R-whisper, a novel multi-stage and multi-scale retrieval augmentation approach designed to enhance ASR performance in low-resource settings. Building on the principles of in-context learning (ICL) and retrieval-augmented techniques, our method employs sentence-level ICL in the pre-processing stage to harness contextual information, while integrating token-level k-Nearest Neighbors (kNN) retrieval as a post-processing step to further refine the final output distribution. By synergistically combining sentence-level and token-level retrieval strategies, M2R-whisper effectively mitigates various types of recognition errors. Experiments conducted on Mandarin and subdialect datasets, including AISHELL-1 and KeSpeech, demonstrate substantial improvements in ASR accuracy, all achieved without any parameter updates.

View paper on

Share this with someone who'll enjoy it:

Title:M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper

Paper and Code