Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:QTIP: Quantization with Trellises and Incoherence Processing

Jun 17, 2024

Albert Tseng, Qingyao Sun, David Hou, Christopher De Sa

Figure 1 for QTIP: Quantization with Trellises and Incoherence Processing

Figure 2 for QTIP: Quantization with Trellises and Incoherence Processing

Figure 3 for QTIP: Quantization with Trellises and Incoherence Processing

Figure 4 for QTIP: Quantization with Trellises and Incoherence Processing

Share this with someone who'll enjoy it:

Abstract:Post-training quantization (PTQ) reduces the memory footprint of LLMs by quantizing weights to low-precision datatypes. Since LLM inference is usually memory-bound, PTQ methods can improve inference throughput. Recent state-of-the-art PTQ approaches have converged on using vector quantization (VQ) to quantize multiple weights at once, which improves information utilization through better shaping. However, VQ requires a codebook with size exponential in the dimension. This limits current VQ-based PTQ works to low VQ dimensions ($\le 8$) that in turn limit quantization quality. Here, we introduce QTIP, which instead uses trellis coded quantization (TCQ) to achieve ultra-high-dimensional quantization. TCQ uses a stateful decoder that separates the codebook size from the bitrate and effective dimension. QTIP introduces a spectrum of lookup-only to computed lookup-free trellis codes designed for a hardware-efficient "bitshift" trellis structure; these codes achieve state-of-the-art results in both quantization quality and inference speed.

View paper on

Share this with someone who'll enjoy it:

Title:QTIP: Quantization with Trellises and Incoherence Processing

Paper and Code