Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Scott Novotney

CUE Vectors: Modular Training of Language Models Conditioned on Diverse Contextual Signals

Mar 16, 2022

Scott Novotney, Sreeparna Mukherjee, Zeeshan Ahmed, Andreas Stolcke

Figure 1 for CUE Vectors: Modular Training of Language Models Conditioned on Diverse Contextual Signals

Figure 2 for CUE Vectors: Modular Training of Language Models Conditioned on Diverse Contextual Signals

Figure 3 for CUE Vectors: Modular Training of Language Models Conditioned on Diverse Contextual Signals

Figure 4 for CUE Vectors: Modular Training of Language Models Conditioned on Diverse Contextual Signals

Abstract:We propose a framework to modularize the training of neural language models that use diverse forms of sentence-external context (including metadata) by eliminating the need to jointly train sentence-external and within-sentence encoders. Our approach, contextual universal embeddings (CUE), trains LMs on one set of context, such as date and author, and adapts to novel metadata types, such as article title, or previous sentence. The model consists of a pretrained neural sentence LM, a BERT-based context encoder, and a masked transformer decoder that estimates LM probabilities using sentence-internal and sentence-external information. When context or metadata are unavailable, our model learns to combine contextual and sentence-internal information using noisy oracle unigram embeddings as a proxy. Real contextual information can be introduced later and used to adapt a small number of parameters that map contextual data into the decoder's embedding space. We validate the CUE framework on a NYTimes text corpus with multiple metadata types, for which the LM perplexity can be lowered from 36.6 to 27.4 by conditioning on context. Bootstrapping a contextual LM with only a subset of the context/metadata during training retains 85\% of the achievable gain. Training the model initially with proxy context retains 67% of the perplexity gain after adapting to real context. Furthermore, we can swap one type of pretrained sentence LM for another without retraining the context encoders, by only adapting the decoder model. Overall, we obtain a modular framework that allows incremental, scalable training of context-enhanced LMs.

* To appear in Findings of ACL 2022

Via

Access Paper or Ask Questions

Attention-based Contextual Language Model Adaptation for Speech Recognition

Jun 02, 2021

Richard Diehl Martinez, Scott Novotney, Ivan Bulyko, Ariya Rastrow, Andreas Stolcke, Ankur Gandhe

Figure 1 for Attention-based Contextual Language Model Adaptation for Speech Recognition

Figure 2 for Attention-based Contextual Language Model Adaptation for Speech Recognition

Figure 3 for Attention-based Contextual Language Model Adaptation for Speech Recognition

Figure 4 for Attention-based Contextual Language Model Adaptation for Speech Recognition

Abstract:Language modeling (LM) for automatic speech recognition (ASR) does not usually incorporate utterance level contextual information. For some domains like voice assistants, however, additional context, such as the time at which an utterance was spoken, provides a rich input signal. We introduce an attention mechanism for training neural speech recognition language models on both text and non-linguistic contextual data. When applied to a large de-identified dataset of utterances collected by a popular voice assistant platform, our method reduces perplexity by 7.0% relative over a standard LM that does not incorporate contextual information. When evaluated on utterances extracted from the long tail of the dataset, our method improves perplexity by 9.0% relative over a standard LM and by over 2.8% relative when compared to a state-of-the-art model for contextual LM.

Via

Access Paper or Ask Questions

Improving accuracy of rare words for RNN-Transducer through unigram shallow fusion

Nov 30, 2020

Vijay Ravi, Yile Gu, Ankur Gandhe, Ariya Rastrow, Linda Liu, Denis Filimonov, Scott Novotney, Ivan Bulyko

Figure 1 for Improving accuracy of rare words for RNN-Transducer through unigram shallow fusion

Figure 2 for Improving accuracy of rare words for RNN-Transducer through unigram shallow fusion

Figure 3 for Improving accuracy of rare words for RNN-Transducer through unigram shallow fusion

Figure 4 for Improving accuracy of rare words for RNN-Transducer through unigram shallow fusion

Abstract:End-to-end automatic speech recognition (ASR) systems, such as recurrent neural network transducer (RNN-T), have become popular, but rare word remains a challenge. In this paper, we propose a simple, yet effective method called unigram shallow fusion (USF) to improve rare words for RNN-T. In USF, we extract rare words from RNN-T training data based on unigram count, and apply a fixed reward when the word is encountered during decoding. We show that this simple method can improve performance on rare words by 3.7% WER relative without degradation on general test set, and the improvement from USF is additive to any additional language model based rescoring. Then, we show that the same USF does not work on conventional hybrid system. Finally, we reason that USF works by fixing errors in probability estimates of words due to Viterbi search used during decoding with subword-based RNN-T.

Via

Access Paper or Ask Questions