Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion Models

Dec 10, 2023

Zhipeng Bao, Yijun Li, Krishna Kumar Singh, Yu-Xiong Wang, Martial Hebert

Figure 1 for Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion Models

Figure 2 for Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion Models

Figure 3 for Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion Models

Figure 4 for Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion Models

Share this with someone who'll enjoy it:

Abstract:Despite recent significant strides achieved by diffusion-based Text-to-Image (T2I) models, current systems are still less capable of ensuring decent compositional generation aligned with text prompts, particularly for the multi-object generation. This work illuminates the fundamental reasons for such misalignment, pinpointing issues related to low attention activation scores and mask overlaps. While previous research efforts have individually tackled these issues, we assert that a holistic approach is paramount. Thus, we propose two novel objectives, the Separate loss and the Enhance loss, that reduce object mask overlaps and maximize attention scores, respectively. Our method diverges from conventional test-time-adaptation techniques, focusing on finetuning critical parameters, which enhances scalability and generalizability. Comprehensive evaluations demonstrate the superior performance of our model in terms of image realism, text-image alignment, and adaptability, notably outperforming prominent baselines. Ultimately, this research paves the way for T2I diffusion models with enhanced compositional capacities and broader applicability. The project webpage is available at https://zpbao.github.io/projects/SepEn/.

* The project webpage is available at https://zpbao.github.io/projects/SepEn/

View paper on

Share this with someone who'll enjoy it:

Title:Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion Models

Paper and Code