Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

Aug 30, 2024

Juncan Deng, Shuaiting Li, Zeyu Wang, Hong Gu, Kedong Xu, Kejie Huang

Figure 1 for VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

Figure 2 for VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

Figure 3 for VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

Figure 4 for VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

Share this with someone who'll enjoy it:

Abstract:The Diffusion Transformers Models (DiTs) have transitioned the network architecture from traditional UNets to transformers, demonstrating exceptional capabilities in image generation. Although DiTs have been widely applied to high-definition video generation tasks, their large parameter size hinders inference on edge devices. Vector quantization (VQ) can decompose model weight into a codebook and assignments, allowing extreme weight quantization and significantly reducing memory usage. In this paper, we propose VQ4DiT, a fast post-training vector quantization method for DiTs. We found that traditional VQ methods calibrate only the codebook without calibrating the assignments. This leads to weight sub-vectors being incorrectly assigned to the same assignment, providing inconsistent gradients to the codebook and resulting in a suboptimal result. To address this challenge, VQ4DiT calculates the candidate assignment set for each weight sub-vector based on Euclidean distance and reconstructs the sub-vector based on the weighted average. Then, using the zero-data and block-wise calibration method, the optimal assignment from the set is efficiently selected while calibrating the codebook. VQ4DiT quantizes a DiT XL/2 model on a single NVIDIA A100 GPU within 20 minutes to 5 hours depending on the different quantization settings. Experiments show that VQ4DiT establishes a new state-of-the-art in model size and performance trade-offs, quantizing weights to 2-bit precision while retaining acceptable image generation quality.

* 11 pages, 6 figures

View paper on

Share this with someone who'll enjoy it:

Title:VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

Paper and Code