Picture for Yuxun Miao

Yuxun Miao

Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity

Add code
Dec 03, 2024
Viaarxiv icon