Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training

Feb 11, 2025

Xin Tan, Yuetao Chen, Yimin Jiang, Xing Chen, Kun Yan, Nan Duan, Yibo Zhu, Daxin Jiang, Hong Xu

Figure 1 for DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training

Figure 2 for DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training

Figure 3 for DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training

Figure 4 for DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training

Share this with someone who'll enjoy it:

Abstract:Diffusion Transformers (DiTs) have shown remarkable performance in modeling and generating high-quality videos. However, the quadratic computational complexity of 3D full attention mechanism presents significant challenges in scaling video DiT training, especially for high-definition and lengthy videos, where attention can dominate up to 95% of the end-to-end time and necessitate specialized communication paradigms to handle large input sizes. This paper introduces DSV, a novel framework designed to accelerate and scale the training of video DiTs by leveraging the inherent dynamic attention sparsity throughout the training process. DSV employs a two-stage training algorithm that exploits sparsity patterns, focusing on critical elements supported by efficient, tailored kernels. To accommodate the new sparsity dimension, we develop a hybrid sparsity-aware context parallelism that effectively scales to large inputs by addressing the heterogeneity of sparsity across attention heads and blocks, resulting in optimized sparse computation and communication. Extensive evaluations demonstrate that DSV achieves up to 3.02x gain in training throughput with nearly no quality degradation.

View paper on

Share this with someone who'll enjoy it:

Title:DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training

Paper and Code