Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Angad Chandorkar

Deferred NAM: Low-latency Top-K Context Injection via DeferredContext Encoding for Non-Streaming ASR

Apr 15, 2024

Zelin Wu, Gan Song, Christopher Li, Pat Rondon, Zhong Meng, Xavier Velez, Weiran Wang, Diamantino Caseiro, Golan Pundak, Tsendsuren Munkhdalai(+2 more)

Figure 1 for Deferred NAM: Low-latency Top-K Context Injection via DeferredContext Encoding for Non-Streaming ASR

Figure 2 for Deferred NAM: Low-latency Top-K Context Injection via DeferredContext Encoding for Non-Streaming ASR

Figure 3 for Deferred NAM: Low-latency Top-K Context Injection via DeferredContext Encoding for Non-Streaming ASR

Figure 4 for Deferred NAM: Low-latency Top-K Context Injection via DeferredContext Encoding for Non-Streaming ASR

Abstract:Contextual biasing enables speech recognizers to transcribe important phrases in the speaker's context, such as contact names, even if they are rare in, or absent from, the training data. Attention-based biasing is a leading approach which allows for full end-to-end cotraining of the recognizer and biasing system and requires no separate inference-time components. Such biasers typically consist of a context encoder; followed by a context filter which narrows down the context to apply, improving per-step inference time; and, finally, context application via cross attention. Though much work has gone into optimizing per-frame performance, the context encoder is at least as important: recognition cannot begin before context encoding ends. Here, we show the lightweight phrase selection pass can be moved before context encoding, resulting in a speedup of up to 16.1 times and enabling biasing to scale to 20K phrases with a maximum pre-decoding delay under 33ms. With the addition of phrase- and wordpiece-level cross-entropy losses, our technique also achieves up to a 37.5% relative WER reduction over the baseline without the losses and lightweight phrase selection pass.

* 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics - Industry Track
* 9 pages, 3 figures, accepted by NAACL 2024 - Industry Track

Via

Access Paper or Ask Questions

Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition

Oct 07, 2021

Tsendsuren Munkhdalai, Khe Chai Sim, Angad Chandorkar, Fan Gao, Mason Chua, Trevor Strohman, Françoise Beaufays

Figure 1 for Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition

Figure 2 for Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition

Figure 3 for Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition

Figure 4 for Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition

Abstract:Fast contextual adaptation has shown to be effective in improving Automatic Speech Recognition (ASR) of rare words and when combined with an on-device personalized training, it can yield an even better recognition result. However, the traditional re-scoring approaches based on an external language model is prone to diverge during the personalized training. In this work, we introduce a model-based end-to-end contextual adaptation approach that is decoder-agnostic and amenable to on-device personalization. Our on-device simulation experiments demonstrate that the proposed approach outperforms the traditional re-scoring technique by 12% relative WER and 15.7% entity mention specific F1-score in a continues personalization scenario.

* 5 pages, 3 figures, 3 tables

Via

Access Paper or Ask Questions

On-Device Personalization of Automatic Speech Recognition Models for Disordered Speech

Jun 18, 2021

Katrin Tomanek, Françoise Beaufays, Julie Cattiau, Angad Chandorkar, Khe Chai Sim

Figure 1 for On-Device Personalization of Automatic Speech Recognition Models for Disordered Speech

Figure 2 for On-Device Personalization of Automatic Speech Recognition Models for Disordered Speech

Figure 3 for On-Device Personalization of Automatic Speech Recognition Models for Disordered Speech

Figure 4 for On-Device Personalization of Automatic Speech Recognition Models for Disordered Speech

Abstract:While current state-of-the-art Automatic Speech Recognition (ASR) systems achieve high accuracy on typical speech, they suffer from significant performance degradation on disordered speech and other atypical speech patterns. Personalization of ASR models, a commonly applied solution to this problem, is usually performed in a server-based training environment posing problems around data privacy, delayed model-update times, and communication cost for copying data and models between mobile device and server infrastructure. In this paper, we present an approach to on-device based ASR personalization with very small amounts of speaker-specific data. We test our approach on a diverse set of 100 speakers with disordered speech and find median relative word error rate improvement of 71% with only 50 short utterances required per speaker. When tested on a voice-controlled home automation platform, on-device personalized models show a median task success rate of 81%, compared to only 40% of the unadapted models.

Via

Access Paper or Ask Questions