Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Oct 15, 2024

Yingyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song, Yufa Zhou

Figure 1 for Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Figure 2 for Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Figure 3 for Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Figure 4 for Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Share this with someone who'll enjoy it:

Abstract:Large Language Models (LLMs) have shown immense potential in enhancing various aspects of our daily lives, from conversational AI to search and AI assistants. However, their growing capabilities come at the cost of extremely large model sizes, making deployment on edge devices challenging due to memory and computational constraints. This paper introduces a novel approach to LLM weight pruning that directly optimizes for approximating the attention matrix, a core component of transformer architectures. Unlike existing methods that focus on linear approximations, our approach accounts for the non-linear nature of the Softmax attention mechanism. We provide theoretical guarantees for the convergence of our Gradient Descent-based optimization method to a near-optimal pruning mask solution. Our preliminary empirical results demonstrate the effectiveness of this approach in maintaining model performance while significantly reducing computational costs. This work establishes a new theoretical foundation for pruning algorithm design in LLMs, potentially paving the way for more efficient LLM inference on resource-constrained devices.

View paper on

Share this with someone who'll enjoy it:

Title:Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Paper and Code