Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization

May 31, 2024

Wenxuan Liu, Sai Qian Zhang

Figure 1 for HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization

Figure 2 for HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization

Figure 3 for HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization

Figure 4 for HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization

Share this with someone who'll enjoy it:

Abstract:Diffusion Transformers (DiTs) have recently gained substantial attention in both industrial and academic fields for their superior visual generation capabilities, outperforming traditional diffusion models that use U-Net. However,the enhanced performance of DiTs also comes with high parameter counts and implementation costs, seriously restricting their use on resource-limited devices such as mobile phones. To address these challenges, we introduce the Hybrid Floating-point Quantization for DiT(HQ-DiT), an efficient post-training quantization method that utilizes 4-bit floating-point (FP) precision on both weights and activations for DiT inference. Compared to fixed-point quantization (e.g., INT8), FP quantization, complemented by our proposed clipping range selection mechanism, naturally aligns with the data distribution within DiT, resulting in a minimal quantization error. Furthermore, HQ-DiT also implements a universal identity mathematical transform to mitigate the serious quantization error caused by the outliers. The experimental results demonstrate that DiT can achieve extremely low-precision quantization (i.e., 4 bits) with negligible impact on performance. Our approach marks the first instance where both weights and activations in DiTs are quantized to just 4 bits, with only a 0.12 increase in sFID on ImageNet.

View paper on

Share this with someone who'll enjoy it:

Title:HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization

Paper and Code