Picture for Alina Shutova

Alina Shutova

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

Add code
Apr 09, 2025
Viaarxiv icon

Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models

Add code
Jan 31, 2025
Viaarxiv icon