Picture for Ganesh Bikshandi

Ganesh Bikshandi

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Add code
Jul 11, 2024
Figure 1 for FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Figure 2 for FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Figure 3 for FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Figure 4 for FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Viaarxiv icon

A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library

Add code
Dec 19, 2023
Viaarxiv icon