Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Runsheng Bai

SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization

Dec 05, 2024

Runsheng Bai, Qiang Liu, Bo Liu

Figure 1 for SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization

Figure 2 for SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization

Figure 3 for SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization

Figure 4 for SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization

Abstract:Large Language Models (LLMs) exhibit impressive performance across various tasks, but deploying them for inference poses challenges. Their high resource demands often necessitate complex, costly multi-GPU pipelines, or the use of smaller, less capable models. While quantization offers a promising solution utilizing lower precision for model storage, existing methods frequently experience significant performance drops at lower precision levels. Additionally, they typically provide only a limited set of solutions at specific bit levels, many of which are extensively manually tuned. To address these challenges, we propose a new method called SKIM: Scaled K-means clustering wIth Mixed precision. Our approach introduces two novel techniques: 1. A greedy algorithm to solve approximately optimal bit allocation across weight channels, and 2. A trainable scaling vector for non-differentiable K-means clustering. These techniques substantially improve performance and can be adapted to any given bit. Notably, in terms of model perplexity, our method narrows the gap between 3-bit quantized LLaMA models and their full precision counterparts by 16.3% on average.

Via

Access Paper or Ask Questions

depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers

Mar 14, 2024

Kaichao You, Runsheng Bai, Meng Cao, Jianmin Wang, Ion Stoica, Mingsheng Long

Figure 1 for depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers

Figure 2 for depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers

Figure 3 for depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers

Abstract:PyTorch \texttt{2.x} introduces a compiler designed to accelerate deep learning programs. However, for machine learning researchers, adapting to the PyTorch compiler to full potential can be challenging. The compiler operates at the Python bytecode level, making it appear as an opaque box. To address this, we introduce \texttt{depyf}, a tool designed to demystify the inner workings of the PyTorch compiler. \texttt{depyf} decompiles bytecode generated by PyTorch back into equivalent source code, and establishes connections between in-memory code objects and their on-disk source code counterparts. This feature enables users to step through the source code line by line using debuggers, thus enhancing their understanding of the underlying processes. Notably, \texttt{depyf} is non-intrusive and user-friendly, primarily relying on two convenient context managers for its core functionality. The project is \href{https://github.com/thuml/depyf}{ openly available} and is recognized as a \href{https://pytorch.org/ecosystem/}{PyTorch ecosystem project}.

* 16 pages, 2 figures

Via

Access Paper or Ask Questions