Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Sub-8-bit quantization for on-device speech recognition: a regularization-free approach

Oct 17, 2022

Kai Zhen, Martin Radfar, Hieu Duy Nguyen, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris

Figure 1 for Sub-8-bit quantization for on-device speech recognition: a regularization-free approach

Figure 2 for Sub-8-bit quantization for on-device speech recognition: a regularization-free approach

Figure 3 for Sub-8-bit quantization for on-device speech recognition: a regularization-free approach

Figure 4 for Sub-8-bit quantization for on-device speech recognition: a regularization-free approach

Share this with someone who'll enjoy it:

Abstract:For on-device automatic speech recognition (ASR), quantization aware training (QAT) is ubiquitous to achieve the trade-off between model predictive performance and efficiency. Among existing QAT methods, one major drawback is that the quantization centroids have to be predetermined and fixed. To overcome this limitation, we introduce a regularization-free, "soft-to-hard" compression mechanism with self-adjustable centroids in a mu-Law constrained space, resulting in a simpler yet more versatile quantization scheme, called General Quantizer (GQ). We apply GQ to ASR tasks using Recurrent Neural Network Transducer (RNN-T) and Conformer architectures on both LibriSpeech and de-identified far-field datasets. Without accuracy degradation, GQ can compress both RNN-T and Conformer into sub-8-bit, and for some RNN-T layers, to 1-bit for fast and accurate inference. We observe a 30.73% memory footprint saving and 31.75% user-perceived latency reduction compared to 8-bit QAT via physical device benchmarking.

* Accepted for publication at IEEE SLT'22

View paper on

Share this with someone who'll enjoy it:

Title:Sub-8-bit quantization for on-device speech recognition: a regularization-free approach

Paper and Code