Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michael Riley

AT&T Laboratories

Towards Fast Inference: Exploring and Improving Blockwise Parallel Drafts

Apr 14, 2024

Taehyeon Kim, Ananda Theertha Suresh, Kishore Papineni, Michael Riley, Sanjiv Kumar, Adrian Benton

Figure 1 for Towards Fast Inference: Exploring and Improving Blockwise Parallel Drafts

Figure 2 for Towards Fast Inference: Exploring and Improving Blockwise Parallel Drafts

Figure 3 for Towards Fast Inference: Exploring and Improving Blockwise Parallel Drafts

Figure 4 for Towards Fast Inference: Exploring and Improving Blockwise Parallel Drafts

Abstract:Despite the remarkable strides made by autoregressive language models, their potential is often hampered by the slow inference speeds inherent in sequential token generation. Blockwise parallel decoding (BPD) was proposed by Stern et al. (2018) as a way to improve inference speed of language models. In this paper, we make two contributions to understanding and improving BPD drafts. We first offer an analysis of the token distributions produced by the BPD prediction heads. Secondly, we use this analysis to inform algorithms to improve BPD inference speed by refining the BPD drafts using small n-gram or neural language models. We empirically show that these refined BPD drafts yield a higher average verified prefix length across tasks.

Via

Access Paper or Ask Questions

Self-supervised Adaptive Weighting for Cooperative Perception in V2V Communications

Dec 16, 2023

Chenguang Liu, Jianjun Chen, Yunfei Chen, Ryan Payton, Michael Riley, Shuang-Hua Yang

Figure 1 for Self-supervised Adaptive Weighting for Cooperative Perception in V2V Communications

Figure 2 for Self-supervised Adaptive Weighting for Cooperative Perception in V2V Communications

Figure 3 for Self-supervised Adaptive Weighting for Cooperative Perception in V2V Communications

Figure 4 for Self-supervised Adaptive Weighting for Cooperative Perception in V2V Communications

Abstract:Perception of the driving environment is critical for collision avoidance and route planning to ensure driving safety. Cooperative perception has been widely studied as an effective approach to addressing the shortcomings of single-vehicle perception. However, the practical limitations of vehicle-to-vehicle (V2V) communications have not been adequately investigated. In particular, current cooperative fusion models rely on supervised models and do not address dynamic performance degradation caused by arbitrary channel impairments. In this paper, a self-supervised adaptive weighting model is proposed for intermediate fusion to mitigate the adverse effects of channel distortion. The performance of cooperative perception is investigated in different system settings. Rician fading and imperfect channel state information (CSI) are also considered. Numerical results demonstrate that the proposed adaptive weighting algorithm significantly outperforms the benchmarks without weighting. Visualization examples validate that the proposed weighting algorithm can flexibly adapt to various channel conditions. Moreover, the adaptive weighting algorithm demonstrates good generalization to untrained channels and test datasets from different domains.

* accepted by IEEE Transactions on Intelligent Vehicles

Via

Access Paper or Ask Questions

Cooperative Perception with Learning-Based V2V communications

Nov 17, 2023

Chenguang Liu, Yunfei Chen, Jianjun Chen, Ryan Payton, Michael Riley, Shuang-Hua Yang

Figure 1 for Cooperative Perception with Learning-Based V2V communications

Figure 2 for Cooperative Perception with Learning-Based V2V communications

Figure 3 for Cooperative Perception with Learning-Based V2V communications

Figure 4 for Cooperative Perception with Learning-Based V2V communications

Abstract:Cooperative perception has been widely used in autonomous driving to alleviate the inherent limitation of single automated vehicle perception. To enable cooperation, vehicle-to-vehicle (V2V) communication plays an indispensable role. This work analyzes the performance of cooperative perception accounting for communications channel impairments. Different fusion methods and channel impairments are evaluated. A new late fusion scheme is proposed to leverage the robustness of intermediate features. In order to compress the data size incurred by cooperation, a convolution neural network-based autoencoder is adopted. Numerical results demonstrate that intermediate fusion is more robust to channel impairments than early fusion and late fusion, when the SNR is greater than 0 dB. Also, the proposed fusion scheme outperforms the conventional late fusion using detection outputs, and autoencoder provides a good compromise between detection accuracy and bandwidth usage.

* in IEEE Wireless Communications Letters, vol. 12, no. 11, pp. 1831-1835, Nov. 2023

Via

Access Paper or Ask Questions

Large-scale Language Model Rescoring on Long-form Data

Jun 13, 2023

Tongzhou Chen, Cyril Allauzen, Yinghui Huang, Daniel Park, David Rybach, W. Ronny Huang, Rodrigo Cabrera, Kartik Audhkhasi, Bhuvana Ramabhadran, Pedro J. Moreno(+1 more)

Figure 1 for Large-scale Language Model Rescoring on Long-form Data

Figure 2 for Large-scale Language Model Rescoring on Long-form Data

Figure 3 for Large-scale Language Model Rescoring on Long-form Data

Figure 4 for Large-scale Language Model Rescoring on Long-form Data

Abstract:In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recognition (ASR) of YouTube videos, which we use as a source for long-form ASR. We demonstrate up to 8\% relative reduction in Word Error Eate (WER) on US English (en-us) and code-switched Indian English (en-in) long-form ASR test sets and a reduction of up to 30\% relative on Salient Term Error Rate (STER) over a strong first-pass baseline that uses a maximum-entropy based language model. Improved lattice processing that results in a lattice with a proper (non-tree) digraph topology and carrying context from the 1-best hypothesis of the previous segment(s) results in significant wins in rescoring with LLMs. We also find that the gains in performance from the combination of LLMs trained on vast quantities of available data (such as C4) and conventional neural LMs is additive and significantly outperforms a strong first-pass baseline with a maximum entropy LM.

* ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
* 5 pages, accepted in ICASSP 2023

Via

Access Paper or Ask Questions

LAST: Scalable Lattice-Based Speech Modelling in JAX

Apr 25, 2023

Ke Wu, Ehsan Variani, Tom Bagby, Michael Riley

Abstract:We introduce LAST, a LAttice-based Speech Transducer library in JAX. With an emphasis on flexibility, ease-of-use, and scalability, LAST implements differentiable weighted finite state automaton (WFSA) algorithms needed for training \& inference that scale to a large WFSA such as a recognition lattice over the entire utterance. Despite these WFSA algorithms being well-known in the literature, new challenges arise from performance characteristics of modern architectures, and from nuances in automatic differentiation. We describe a suite of generally applicable techniques employed in LAST to address these challenges, and demonstrate their effectiveness with benchmarks on TPUv3 and V100 GPU.

Via

Access Paper or Ask Questions

Alignment Entropy Regularization

Dec 22, 2022

Ehsan Variani, Ke Wu, David Rybach, Cyril Allauzen, Michael Riley

Figure 1 for Alignment Entropy Regularization

Figure 2 for Alignment Entropy Regularization

Figure 3 for Alignment Entropy Regularization

Figure 4 for Alignment Entropy Regularization

Abstract:Existing training criteria in automatic speech recognition(ASR) permit the model to freely explore more than one time alignments between the feature and label sequences. In this paper, we use entropy to measure a model's uncertainty, i.e. how it chooses to distribute the probability mass over the set of allowed alignments. Furthermore, we evaluate the effect of entropy regularization in encouraging the model to distribute the probability mass only on a smaller subset of allowed alignments. Experiments show that entropy regularization enables a much simpler decoding method without sacrificing word error rate, and provides better time alignment quality.

Via

Access Paper or Ask Questions

Global Normalization for Streaming Speech Recognition in a Modular Framework

May 26, 2022

Ehsan Variani, Ke Wu, Michael Riley, David Rybach, Matt Shannon, Cyril Allauzen

Figure 1 for Global Normalization for Streaming Speech Recognition in a Modular Framework

Figure 2 for Global Normalization for Streaming Speech Recognition in a Modular Framework

Figure 3 for Global Normalization for Streaming Speech Recognition in a Modular Framework

Figure 4 for Global Normalization for Streaming Speech Recognition in a Modular Framework

Abstract:We introduce the Globally Normalized Autoregressive Transducer (GNAT) for addressing the label bias problem in streaming speech recognition. Our solution admits a tractable exact computation of the denominator for the sequence-level normalization. Through theoretical and empirical results, we demonstrate that by switching to a globally normalized model, the word error rate gap between streaming and non-streaming speech-recognition models can be greatly reduced (by more than 50\% on the Librispeech dataset). This model is developed in a modular framework which encompasses all the common neural speech recognition models. The modularity of this framework enables controlled comparison of modelling choices and creation of new models.

Via

Access Paper or Ask Questions

Learning discrete distributions: user vs item-level privacy

Jul 28, 2020

Yuhan Liu, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, Michael Riley

Figure 1 for Learning discrete distributions: user vs item-level privacy

Abstract:Much of the literature on differential privacy focuses on item-level privacy, where loosely speaking, the goal is to provide privacy per item or training example. However, recently many practical applications such as federated learning require preserving privacy for all items of a single user, which is much harder to achieve. Therefore understanding the theoretical limit of user-level privacy becomes crucial. We study the fundamental problem of learning discrete distributions over $k$ symbols with user-level differential privacy. If each user has $m$ samples, we show that straightforward applications of Laplace or Gaussian mechanisms require the number of users to be $\mathcal{O}(k/(m\alpha^2) + k/\epsilon\alpha)$ to achieve an $\ell_1$ distance of $\alpha$ between the true and estimated distributions, with the privacy-induced penalty $k/\epsilon\alpha$ independent of the number of samples per user $m$. Moreover, we show that any mechanism that only operates on the final aggregate should require a user complexity of the same order. We then propose a mechanism such that the number of users scales as $\tilde{\mathcal{O}}(k/(m\alpha^2) + k/\sqrt{m}\epsilon\alpha)$ and further show that it is nearly-optimal under certain regimes. Thus the privacy penalty is $\mathcal{O}(\sqrt{m})$ times smaller compared to the standard mechanisms. We also propose general techniques for obtaining lower bounds on restricted differentially private estimators and a lower bound on the total variation between binomial distributions, both of which might be of independent interest.

* 36 pages

Via

Access Paper or Ask Questions

Hybrid Autoregressive Transducer (hat)

Mar 12, 2020

Ehsan Variani, David Rybach, Cyril Allauzen, Michael Riley

Figure 1 for Hybrid Autoregressive Transducer (hat)

Figure 2 for Hybrid Autoregressive Transducer (hat)

Figure 3 for Hybrid Autoregressive Transducer (hat)

Figure 4 for Hybrid Autoregressive Transducer (hat)

Abstract:This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model, a time-synchronous encoderdecoder model that preserves the modularity of conventional automatic speech recognition systems. The HAT model provides a way to measure the quality of the internal language model that can be used to decide whether inference with an external language model is beneficial or not. This article also presents a finite context version of the HAT model that addresses the exposure bias problem and significantly simplifies the overall training and inference. We evaluate our proposed model on a large-scale voice search task. Our experiments show significant improvements in WER compared to the state-of-the-art approaches.

Via

Access Paper or Ask Questions

Federated Learning of N-gram Language Models

Oct 08, 2019

Mingqing Chen, Ananda Theertha Suresh, Rajiv Mathews, Adeline Wong, Cyril Allauzen, Françoise Beaufays, Michael Riley

Figure 1 for Federated Learning of N-gram Language Models

Figure 2 for Federated Learning of N-gram Language Models

Figure 3 for Federated Learning of N-gram Language Models

Figure 4 for Federated Learning of N-gram Language Models

Abstract:We propose algorithms to train production-quality n-gram language models using federated learning. Federated learning is a distributed computation platform that can be used to train global models for portable devices such as smart phones. Federated learning is especially relevant for applications handling privacy-sensitive data, such as virtual keyboards, because training is performed without the users' data ever leaving their devices. While the principles of federated learning are fairly generic, its methodology assumes that the underlying models are neural networks. However, virtual keyboards are typically powered by n-gram language models for latency reasons. We propose to train a recurrent neural network language model using the decentralized FederatedAveraging algorithm and to approximate this federated model server-side with an n-gram model that can be deployed to devices for fast inference. Our technical contributions include ways of handling large vocabularies, algorithms to correct capitalization errors in user data, and efficient finite state transducer algorithms to convert word language models to word-piece language models and vice versa. The n-gram language models trained with federated learning are compared to n-grams trained with traditional server-based algorithms using A/B tests on tens of millions of users of virtual keyboard. Results are presented for two languages, American English and Brazilian Portuguese. This work demonstrates that high-quality n-gram language models can be trained directly on client mobile devices without sensitive training data ever leaving the devices.

* 10 pages

Via

Access Paper or Ask Questions