Picture for Aomufei Yuan

Aomufei Yuan

KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference

Add code
Apr 14, 2025
Viaarxiv icon

FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference

Add code
Feb 19, 2025
Viaarxiv icon

Prediction Is All MoE Needs: Expert Load Distribution Goes from Fluctuating to Stabilizing

Add code
Apr 25, 2024
Viaarxiv icon