Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Seohyun Lim

Rethinking Direct Preference Optimization in Diffusion Models

May 24, 2025

Junyong Kang, Seohyun Lim, Kyungjune Baek, Hyunjung Shim

Abstract:Aligning text-to-image (T2I) diffusion models with human preferences has emerged as a critical research challenge. While recent advances in this area have extended preference optimization techniques from large language models (LLMs) to the diffusion setting, they often struggle with limited exploration. In this work, we propose a novel and orthogonal approach to enhancing diffusion-based preference optimization. First, we introduce a stable reference model update strategy that relaxes the frozen reference model, encouraging exploration while maintaining a stable optimization anchor through reference model regularization. Second, we present a timestep-aware training strategy that mitigates the reward scale imbalance problem across timesteps. Our method can be integrated into various preference optimization algorithms. Experimental results show that our approach improves the performance of state-of-the-art methods on human preference evaluation benchmarks.

* 21 pages, 12 figures, preprint

Via

Access Paper or Ask Questions

Scribble-Guided Diffusion for Training-free Text-to-Image Generation

Sep 12, 2024

Seonho Lee, Jiho Choi, Seohyun Lim, Jiwook Kim, Hyunjung Shim

Figure 1 for Scribble-Guided Diffusion for Training-free Text-to-Image Generation

Figure 2 for Scribble-Guided Diffusion for Training-free Text-to-Image Generation

Figure 3 for Scribble-Guided Diffusion for Training-free Text-to-Image Generation

Figure 4 for Scribble-Guided Diffusion for Training-free Text-to-Image Generation

Abstract:Recent advancements in text-to-image diffusion models have demonstrated remarkable success, yet they often struggle to fully capture the user's intent. Existing approaches using textual inputs combined with bounding boxes or region masks fall short in providing precise spatial guidance, often leading to misaligned or unintended object orientation. To address these limitations, we propose Scribble-Guided Diffusion (ScribbleDiff), a training-free approach that utilizes simple user-provided scribbles as visual prompts to guide image generation. However, incorporating scribbles into diffusion models presents challenges due to their sparse and thin nature, making it difficult to ensure accurate orientation alignment. To overcome these challenges, we introduce moment alignment and scribble propagation, which allow for more effective and flexible alignment between generated images and scribble inputs. Experimental results on the PASCAL-Scribble dataset demonstrate significant improvements in spatial control and consistency, showcasing the effectiveness of scribble-based guidance in diffusion models. Our code is available at https://github.com/kaist-cvml-lab/scribble-diffusion.

Via

Access Paper or Ask Questions

Memory-Efficient Personalization using Quantized Diffusion Model

Jan 09, 2024

Hyogon Ryu, Seohyun Lim, Hyunjung Shim

Abstract:The rise of billion-parameter diffusion models like Stable Diffusion XL, Imagen, and Dall-E3 markedly advances the field of generative AI. However, their large-scale nature poses challenges in fine-tuning and deployment due to high resource demands and slow inference speed. This paper ventures into the relatively unexplored yet promising realm of fine-tuning quantized diffusion models. We establish a strong baseline by customizing three models: PEQA for fine-tuning quantization parameters, Q-Diffusion for post-training quantization, and DreamBooth for personalization. Our analysis reveals a notable trade-off between subject and prompt fidelity within the baseline model. To address these issues, we introduce two strategies, inspired by the distinct roles of different timesteps in diffusion models: S1 optimizing a single set of fine-tuning parameters exclusively at selected intervals, and S2 creating multiple fine-tuning parameter sets, each specialized for different timestep intervals. Our approach not only enhances personalization but also upholds prompt fidelity and image quality, significantly outperforming the baseline qualitatively and quantitatively. The code will be made publicly available.

* 20 pages

Via

Access Paper or Ask Questions