Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Nov 04, 2024

Jiarui Fang, Jinzhe Pan, Xibo Sun, Aoyu Li, Jiannan Wang

Figure 1 for xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Figure 2 for xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Figure 3 for xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Figure 4 for xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Share this with someone who'll enjoy it:

Abstract:Diffusion models are pivotal for generating high-quality images and videos. Inspired by the success of OpenAI's Sora, the backbone of diffusion models is evolving from U-Net to Transformer, known as Diffusion Transformers (DiTs). However, generating high-quality content necessitates longer sequence lengths, exponentially increasing the computation required for the attention mechanism, and escalating DiTs inference latency. Parallel inference is essential for real-time DiTs deployments, but relying on a single parallel method is impractical due to poor scalability at large scales. This paper introduces xDiT, a comprehensive parallel inference engine for DiTs. After thoroughly investigating existing DiTs parallel approaches, xDiT chooses Sequence Parallel (SP) and PipeFusion, a novel Patch-level Pipeline Parallel method, as intra-image parallel strategies, alongside CFG parallel for inter-image parallelism. xDiT can flexibly combine these parallel approaches in a hybrid manner, offering a robust and scalable solution. Experimental results on two 8xL40 GPUs (PCIe) nodes interconnected by Ethernet and an 8xA100 (NVLink) node showcase xDiT's exceptional scalability across five state-of-the-art DiTs. Notably, we are the first to demonstrate DiTs scalability on Ethernet-connected GPU clusters. xDiT is available at https://github.com/xdit-project/xDiT.

View paper on

Share this with someone who'll enjoy it:

Title:xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Paper and Code