Picture for Rya Sanovar

Rya Sanovar

Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers

Add code
May 17, 2024
Viaarxiv icon