Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization

May 17, 2024

Yixin Ji, Yang Xiang, Juntao Li, Wei Chen, Zhongyi Liu, Kehai Chen, Min Zhang

Figure 1 for Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization

Figure 2 for Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization

Figure 3 for Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization

Figure 4 for Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization

Share this with someone who'll enjoy it:

Abstract:In recent years, large language models (LLMs) have driven advances in natural language processing. Still, their growing scale has increased the computational burden, necessitating a balance between efficiency and performance. Low-rank compression, a promising technique, reduces non-essential parameters by decomposing weight matrices into products of two low-rank matrices. Yet, its application in LLMs has not been extensively studied. The key to low-rank compression lies in low-rank factorization and low-rank dimensions allocation. To address the challenges of low-rank compression in LLMs, we conduct empirical research on the low-rank characteristics of large models. We propose a low-rank compression method suitable for LLMs. This approach involves precise estimation of feature distributions through pooled covariance matrices and a Bayesian optimization strategy for allocating low-rank dimensions. Experiments on the LLaMA-2 models demonstrate that our method outperforms existing strong structured pruning and low-rank compression techniques in maintaining model performance at the same compression ratio.

* Accepted by 2024 ACL findings

View paper on

Share this with someone who'll enjoy it:

Title:Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization

Paper and Code