Picture for Srikant Bharadwaj

Srikant Bharadwaj

TURBOATTENTION: Efficient Attention Approximation For High Throughputs LLMs

Add code
Dec 11, 2024
Viaarxiv icon

Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers

Add code
May 17, 2024
Viaarxiv icon