Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models

May 24, 2023

David Qiu, David Rim, Shaojin Ding, Oleg Rybakov, Yanzhang He

Figure 1 for RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models

Figure 2 for RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models

Figure 3 for RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models

Figure 4 for RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models

Share this with someone who'll enjoy it:

Abstract:With the rapid increase in the size of neural networks, model compression has become an important area of research. Quantization is an effective technique at decreasing the model size, memory access, and compute load of large models. Despite recent advances in quantization aware training (QAT) technique, most papers present evaluations that are focused on computer vision tasks, which have different training dynamics compared to sequence tasks. In this paper, we first benchmark the impact of popular techniques such as straight through estimator, pseudo-quantization noise, learnable scale parameter, clipping, etc. on 4-bit seq2seq models across a suite of speech recognition datasets ranging from 1,000 hours to 1 million hours, as well as one machine translation dataset to illustrate its applicability outside of speech. Through the experiments, we report that noise based QAT suffers when there is insufficient regularization signal flowing back to the quantization scale. We propose low complexity changes to the QAT process to improve model accuracy (outperforming popular learnable scale and clipping methods). With the improved accuracy, it opens up the possibility to exploit some of the other benefits of noise based QAT: 1) training a single model that performs well in mixed precision mode and 2) improved generalization on long form speech recognition.

View paper on

Share this with someone who'll enjoy it:

Title:RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models

Paper and Code