Picture for Ferran Agullo

Ferran Agullo

Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference

Add code
Mar 11, 2025
Viaarxiv icon