Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:EfQAT: An Efficient Framework for Quantization-Aware Training

Nov 17, 2024

Saleh Ashkboos, Bram Verhoef, Torsten Hoefler, Evangelos Eleftheriou, Martino Dazzi

Figure 1 for EfQAT: An Efficient Framework for Quantization-Aware Training

Figure 2 for EfQAT: An Efficient Framework for Quantization-Aware Training

Figure 3 for EfQAT: An Efficient Framework for Quantization-Aware Training

Figure 4 for EfQAT: An Efficient Framework for Quantization-Aware Training

Share this with someone who'll enjoy it:

Abstract:Quantization-aware training (QAT) schemes have been shown to achieve near-full precision accuracy. They accomplish this by training a quantized model for multiple epochs. This is computationally expensive, mainly because of the full precision backward pass. On the other hand, post-training quantization (PTQ) schemes do not involve training and are therefore computationally cheap, but they usually result in a significant accuracy drop. We address these challenges by proposing EfQAT, which generalizes both schemes by optimizing only a subset of the parameters of a quantized model. EfQAT starts by applying a PTQ scheme to a pre-trained model and only updates the most critical network parameters while freezing the rest, accelerating the backward pass. We demonstrate the effectiveness of EfQAT on various CNNs and Transformer-based models using different GPUs. Specifically, we show that EfQAT is significantly more accurate than PTQ with little extra compute. Furthermore, EfQAT can accelerate the QAT backward pass between 1.44-1.64x while retaining most accuracy.

* 12 pages, 5 figures

View paper on

Share this with someone who'll enjoy it:

Title:EfQAT: An Efficient Framework for Quantization-Aware Training

Paper and Code