Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Feb 14, 2024

Junhan Kim, Kyungphil Park, Chungman Lee, Ho-young Kim, Joonyoung Kim, Yongkweon Jeon

Figure 1 for Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Figure 2 for Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Figure 3 for Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Figure 4 for Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Share this with someone who'll enjoy it:

Abstract:With the increasing complexity of generative AI models, post-training quantization (PTQ) has emerged as a promising solution for deploying hyper-scale models on edge devices such as mobile devices and TVs. Existing PTQ schemes, however, consume considerable time and resources, which could be a bottleneck in real situations where frequent model updates and multiple hyper-parameter tunings are required. As a cost-effective alternative, one-shot PTQ schemes have been proposed. Still, the performance is somewhat limited because they cannot consider the inter-layer dependency within the attention module, which is a very important feature of Transformers. In this paper, we thus propose a novel PTQ algorithm that balances accuracy and efficiency. The key idea of the proposed algorithm called aespa is to perform quantization layer-wise for efficiency while considering cross-layer dependency to preserve the attention score. Through extensive experiments on various language models and complexity analysis, we demonstrate that aespa is accurate and efficient in quantizing Transformer models.

* 17 pages, under review

View paper on

Share this with someone who'll enjoy it:

Title:Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Paper and Code