Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

Aug 03, 2024

Peijie Dong, Lujun Li, Dayou Du, Yuhan Chen, Zhenheng Tang, Qiang Wang, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo(+1 more)

Figure 1 for STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

Figure 2 for STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

Figure 3 for STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

Figure 4 for STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

Share this with someone who'll enjoy it:

Abstract:In this paper, we present STBLLM, the first structural binarization framework for compressing Large Language Models (LLMs) to less than 1-bit precision. LLMs have achieved remarkable performance, but their heavy memory requirements have hindered widespread adoption, particularly on resource-constrained devices. Binarization, which quantifies weights to a mere 1-bit, achieves a milestone in increasing computational efficiency. However, we observe that some weights in binarized LLMs can be randomly flipped without significant performance degradation, indicating the potential for further compression. To exploit this, our STBLLM employs an N:M sparsity to perform structural binarization of the weights. First, we introduce a new Standardized Importance (SI) metric that considers weight magnitude and input feature norm to better evaluate weight significance. Then, we propose a layer-wise approach where different layers of the LLM can be sparsified with varying N:M ratios, balancing compression and accuracy. Finally, we use residual approximation with double binarization to preserve information for salient weights. In addition, we utilize a fine-grained grouping strategy for less important weights that applies different quantization schemes to sparse, intermediate, and dense regions. We conduct extensive experiments on various language models, including the LLaMA-1/2/3, OPT family, and Mistral, to evaluate the effectiveness of STBLLM. The results demonstrate that our approach performs better than other compressed binarization LLM methods while significantly reducing memory requirements.

View paper on

Share this with someone who'll enjoy it:

Title:STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

Paper and Code