Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

Sep 27, 2023

Xiaoliang Dai, Ji Hou, Chih-Yao Ma, Sam Tsai, Jialiang Wang, Rui Wang, Peizhao Zhang, Simon Vandenhende, Xiaofang Wang, Abhimanyu Dubey(+16 more)

Figure 1 for Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

Figure 2 for Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

Figure 3 for Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

Figure 4 for Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

Share this with someone who'll enjoy it:

Abstract:Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text. However, these pre-trained models often face challenges when it comes to generating highly aesthetic images. This creates the need for aesthetic alignment post pre-training. In this paper, we propose quality-tuning to effectively guide a pre-trained model to exclusively generate highly visually appealing images, while maintaining generality across visual concepts. Our key insight is that supervised fine-tuning with a set of surprisingly small but extremely visually appealing images can significantly improve the generation quality. We pre-train a latent diffusion model on $1.1$ billion image-text pairs and fine-tune it with only a few thousand carefully selected high-quality images. The resulting model, Emu, achieves a win rate of $82.9\%$ compared with its pre-trained only counterpart. Compared to the state-of-the-art SDXLv1.0, Emu is preferred $68.4\%$ and $71.3\%$ of the time on visual appeal on the standard PartiPrompts and our Open User Input benchmark based on the real-world usage of text-to-image models. In addition, we show that quality-tuning is a generic approach that is also effective for other architectures, including pixel diffusion and masked generative transformer models.

View paper on

Share this with someone who'll enjoy it:

Title:Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

Paper and Code