Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization

Dec 05, 2024

Runsheng Bai, Qiang Liu, Bo Liu

Figure 1 for SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization

Figure 2 for SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization

Figure 3 for SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization

Figure 4 for SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization

Share this with someone who'll enjoy it:

Abstract:Large Language Models (LLMs) exhibit impressive performance across various tasks, but deploying them for inference poses challenges. Their high resource demands often necessitate complex, costly multi-GPU pipelines, or the use of smaller, less capable models. While quantization offers a promising solution utilizing lower precision for model storage, existing methods frequently experience significant performance drops at lower precision levels. Additionally, they typically provide only a limited set of solutions at specific bit levels, many of which are extensively manually tuned. To address these challenges, we propose a new method called SKIM: Scaled K-means clustering wIth Mixed precision. Our approach introduces two novel techniques: 1. A greedy algorithm to solve approximately optimal bit allocation across weight channels, and 2. A trainable scaling vector for non-differentiable K-means clustering. These techniques substantially improve performance and can be adapted to any given bit. Notably, in terms of model perplexity, our method narrows the gap between 3-bit quantized LLaMA models and their full precision counterparts by 16.3% on average.

View paper on

Share this with someone who'll enjoy it:

Title:SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization

Paper and Code