Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM

Oct 10, 2024

Haiyue Ma, Jian Liu, Ronny Krashinsky

Figure 1 for Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM

Figure 2 for Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM

Figure 3 for Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM

Figure 4 for Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM

Share this with someone who'll enjoy it:

Abstract:Dropout, a network operator, when enabled is likely to dramatically impact the performance of Flash-Attention, which in turn increases the end-to-end training time of Large-Language-Models (LLMs). The main contributor to such performance degradation is the Random Number Generation (RNG) phase that is traditionally fused into the Flash-Attention kernel. As RNG and Attention have the same hardware bottlenecks, RNG latency can hardly be hidden within the Attention kernel. We propose overlapping RNG with previous GEMM layers in the network to hide RNG runtime and improve end-to-end performance. RNG and GEMM have distinct resource requirements and hardware bottlenecks, so they can run in parallel without compromising each other's performance. Our fine-grained performance model, cross-validated by silicon results, shows 1.14x speedup on one transformer block (including multi-head attention and feed-forward layers) for Llama2, and up to 1.23x speedup when varying workload sizes, on GH100 GPUs with FP8 precision. Further, we extend our theoretical model to different RNG implementations and hardware architectures, and discuss the widely applicable benefits for overlapping RNG with GEMM layers.

View paper on

Share this with someone who'll enjoy it:

Title:Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM

Paper and Code