Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models

Nov 08, 2023

Rocktim Jyoti Das, Liqun Ma, Zhiqiang Shen

Figure 1 for Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models

Figure 2 for Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models

Figure 3 for Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models

Figure 4 for Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models

Share this with someone who'll enjoy it:

Abstract:Large Language Models (LLMs) with a billion or more parameters are prime targets for network pruning, which aims to reduce a portion of the network weights without compromising performance. Prior approaches such as Weights Magnitude, SparseGPT, and Wanda, either concentrated solely on weights or integrated weights with activations for sparsity. However, they overlooked the informative gradients derived from pretrained large language models. In this paper, we present a novel sparsity-centric pruning method for pretrained LLMs, termed Gradient-based Language Model Pruner (GBLM-Pruner). GBLM-Pruner leverages the first-order term of the Taylor expansion, operating in a training-free manner by harnessing properly normalized gradients from a few calibration samples to determine the importance pruning score, and substantially outperforms competitive counterparts like SparseGPT and Wanda in multiple benchmarks. Intriguing, after incorporating gradients, the unstructured pruning method tends to reveal some structural patterns post-pruning, which mirrors the geometric interdependence inherent in the LLMs' parameter structure. Additionally, GBLM-Pruner functions without any subsequent retraining or weight updates to maintain its simplicity as other counterparts. Extensive evaluations on LLaMA-1 and LLaMA-2 across various language benchmarks and perplexity show that GBLM-Pruner surpasses magnitude pruning, Wanda (weights+activations) and SparseGPT (weights+activations+weight update) by significant margins. Our code and models are available at https://github.com/RocktimJyotiDas/GBLM-Pruner.

* Technical report. Code and models at https://github.com/RocktimJyotiDas/GBLM-Pruner

View paper on

Share this with someone who'll enjoy it:

Title:Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models

Paper and Code