Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression

Aug 28, 2024

Haowen Hou, Fei Ma, Binwen Bai, Xinxin Zhu, Fei Yu

Figure 1 for Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression

Figure 2 for Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression

Figure 3 for Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression

Figure 4 for Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression

Share this with someone who'll enjoy it:

Abstract:Large Language Models (LLMs) have garnered widespread attention due to their remarkable performance across various tasks. However, to mitigate the issue of hallucinations, LLMs often incorporate retrieval-augmented pipeline to provide them with rich external knowledge and context. Nevertheless, challenges stem from inaccurate and coarse-grained context retrieved from the retriever. Supplying irrelevant context to the LLMs can result in poorer responses, increased inference latency, and higher costs. This paper introduces a method called Instruction-Aware Contextual Compression, which filters out less informative content, thereby accelerating and enhancing the use of LLMs. The experimental results demonstrate that Instruction-Aware Contextual Compression notably reduces memory consumption and minimizes generation latency while maintaining performance levels comparable to those achieved with the use of the full context. Specifically, we achieved a 50% reduction in context-related costs, resulting in a 5% reduction in inference memory usage and a 2.2-fold increase in inference speed, with only a minor drop of 0.047 in Rouge-1. These findings suggest that our method strikes an effective balance between efficiency and performance.

* 20 pages

View paper on

Share this with someone who'll enjoy it:

Title:Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression

Paper and Code