Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Pixel-Space Post-Training of Latent Diffusion Models

Sep 26, 2024

Christina Zhang, Simran Motwani, Matthew Yu, Ji Hou, Felix Juefei-Xu, Sam Tsai, Peter Vajda, Zijian He, Jialiang Wang

Figure 1 for Pixel-Space Post-Training of Latent Diffusion Models

Figure 2 for Pixel-Space Post-Training of Latent Diffusion Models

Figure 3 for Pixel-Space Post-Training of Latent Diffusion Models

Figure 4 for Pixel-Space Post-Training of Latent Diffusion Models

Share this with someone who'll enjoy it:

Abstract:Latent diffusion models (LDMs) have made significant advancements in the field of image generation in recent years. One major advantage of LDMs is their ability to operate in a compressed latent space, allowing for more efficient training and deployment. However, despite these advantages, challenges with LDMs still remain. For example, it has been observed that LDMs often generate high-frequency details and complex compositions imperfectly. We hypothesize that one reason for these flaws is due to the fact that all pre- and post-training of LDMs are done in latent space, which is typically $8 \times 8$ lower spatial-resolution than the output images. To address this issue, we propose adding pixel-space supervision in the post-training process to better preserve high-frequency details. Experimentally, we show that adding a pixel-space objective significantly improves both supervised quality fine-tuning and preference-based post-training by a large margin on a state-of-the-art DiT transformer and U-Net diffusion models in both visual quality and visual flaw metrics, while maintaining the same text alignment quality.

View paper on

Share this with someone who'll enjoy it:

Title:Pixel-Space Post-Training of Latent Diffusion Models

Paper and Code