Picture for Aomufei Yuan

Aomufei Yuan

FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference

Add code
Feb 19, 2025
Viaarxiv icon

Prediction Is All MoE Needs: Expert Load Distribution Goes from Fluctuating to Stabilizing

Add code
Apr 25, 2024
Viaarxiv icon