Picture for Zeyu Xing

Zeyu Xing

KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference

Add code
Feb 06, 2025
Viaarxiv icon