Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR

Sep 13, 2024

Mingyu Cui, Yifan Yang, Jiajun Deng, Jiawen Kang, Shujie Hu, Tianzi Wang, Zhaoqing Li, Shiliang Zhang, Xie Chen, Xunying Liu

Figure 1 for Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR

Figure 2 for Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR

Figure 3 for Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR

Figure 4 for Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR

Share this with someone who'll enjoy it:

Abstract:Self-supervised learning (SSL) based discrete speech representations are highly compact and domain adaptable. In this paper, SSL discrete speech features extracted from WavLM models are used as additional cross-utterance acoustic context features in Zipformer-Transducer ASR systems. The efficacy of replacing Fbank features with discrete token features for modelling either cross-utterance contexts (from preceding and future segments), or current utterance's internal contexts alone, or both at the same time, are demonstrated thoroughly on the Gigaspeech 1000-hr corpus. The best Zipformer-Transducer system using discrete tokens based cross-utterance context features outperforms the baseline using utterance internal context only with statistically significant word error rate (WER) reductions of 0.32% to 0.41% absolute (2.78% to 3.54% relative) on the dev and test data. The lowest published WER of 11.15% and 11.14% were obtained on the dev and test sets. Our work is open-source and publicly available at https://github.com/open-creator/icefall/tree/master/egs/gigaspeech/Context\_ASR.

* Submitted to ICASSP 2025

View paper on

Share this with someone who'll enjoy it:

Title:Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR

Paper and Code