Picture for Ganesh Bikshandi

Ganesh Bikshandi

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Add code
Jul 11, 2024
Viaarxiv icon

A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library

Add code
Dec 19, 2023
Viaarxiv icon