Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

Jun 04, 2024

Zefan Cai., Yichi Zhang, Bofei Gao, Tianyu Liu, Keming Lu, Wayne Xiong, Yue Dong, Baobao Chang, Junjie Hu, Wen Xiao

Figure 1 for PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

Figure 2 for PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

Figure 3 for PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

Figure 4 for PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

Share this with someone who'll enjoy it:

Abstract:In this study, we investigate whether attention-based information flow inside large language models (LLMs) is aggregated through noticeable patterns for long context processing. Our observations reveal that LLMs aggregate information through Pyramidal Information Funneling where attention is scattering widely in lower layers, progressively consolidating within specific contexts, and ultimately focusin on critical tokens (a.k.a massive activation or attention sink) in higher layers. Motivated by these insights, we developed PyramidKV, a novel and effective KV cache compression method. This approach dynamically adjusts the KV cache size across different layers, allocating more cache in lower layers and less in higher ones, diverging from traditional methods that maintain a uniform KV cache size. Our experimental evaluations, utilizing the LongBench benchmark, show that PyramidKV matches the performance of models with a full KV cache while retaining only 12% of the KV cache, thus significantly reducing memory usage. In scenarios emphasizing memory efficiency, where only 0.7% of the KV cache is maintained, PyramidKV surpasses other KV cache compression techniques achieving up to a 20.5 absolute accuracy improvement on TREC.

View paper on

Share this with someone who'll enjoy it:

Title:PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

Paper and Code