Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR

Jun 07, 2024

Shaojun Li, Daimeng Wei, Jiaxin Guo, ZongYao Li, Zhanglin Wu, Zhiqiang Rao, Yuanchang Luo, Xianghui He, Hao Yang

Figure 1 for Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR

Figure 2 for Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR

Figure 3 for Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR

Figure 4 for Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR

Share this with someone who'll enjoy it:

Abstract:Despite recent improvements in End-to-End Automatic Speech Recognition (E2E ASR) systems, the performance can degrade due to vocal characteristic mismatches between training and testing data, particularly with limited target speaker adaptation data. We propose a novel speaker adaptation approach Speaker-Smoothed kNN that leverages k-Nearest Neighbors (kNN) retrieval techniques to improve model output by finding correctly pronounced tokens from its pre-built datastore during the decoding phase. Moreover, we utilize x-vector to dynamically adjust kNN interpolation parameters for data sparsity issue. This approach was validated using KeSpeech and MagicData corpora under in-domain and all-domain settings. Our method consistently performs comparably to fine-tuning without the associated performance degradation during speaker changes. Furthermore, in the all-domain setting, our method achieves state-of-the-art results, reducing the CER in both single speaker and multi-speaker test scenarios.

* Accepted to Interspeech 2024

View paper on

Share this with someone who'll enjoy it:

Title:Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR

Paper and Code