Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:An Empirical Study of Language Model Integration for Transducer based Speech Recognition

Mar 31, 2022

Huahuan Zheng, Keyu An, Zhijian Ou, Chen Huang, Ke Ding, Guanglu Wan

Figure 1 for An Empirical Study of Language Model Integration for Transducer based Speech Recognition

Figure 2 for An Empirical Study of Language Model Integration for Transducer based Speech Recognition

Figure 3 for An Empirical Study of Language Model Integration for Transducer based Speech Recognition

Share this with someone who'll enjoy it:

Abstract:Utilizing text-only data with an external language model (LM) in end-to-end RNN-Transducer (RNN-T) for speech recognition is challenging. Recently, a class of methods such as density ratio (DR) and ILM estimation (ILME) have been developed, outperforming the classic shallow fusion (SF) method. The basic idea behind these methods is that RNN-T posterior should first subtract the implicitly learned ILM prior, in order to integrate the external LM. While recent studies suggest that RNN-T only learns some low-order language model information, the DR method uses a well-trained ILM. We hypothesize that this setting is appropriate and may deteriorate the performance of the DR method, and propose a low-order density ratio method (LODR) by training a low-order weak ILM for DR. Extensive empirical experiments are conducted on both in-domain and cross-domain scenarios on English LibriSpeech & Tedlium-2 and Chinese WenetSpeech & AISHELL-1 datasets. It is shown that LODR consistently outperforms SF in all tasks, while performing generally close to ILME and better than DR in most tests.

* submitted to INTERSPEECH 2022

View paper on

Share this with someone who'll enjoy it:

Title:An Empirical Study of Language Model Integration for Transducer based Speech Recognition

Paper and Code