Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Controllable Text-to-Image Generation with GPT-4

May 29, 2023

Tianjun Zhang, Yi Zhang, Vibhav Vineet, Neel Joshi, Xin Wang

Figure 1 for Controllable Text-to-Image Generation with GPT-4

Figure 2 for Controllable Text-to-Image Generation with GPT-4

Figure 3 for Controllable Text-to-Image Generation with GPT-4

Figure 4 for Controllable Text-to-Image Generation with GPT-4

Share this with someone who'll enjoy it:

Abstract:Current text-to-image generation models often struggle to follow textual instructions, especially the ones requiring spatial reasoning. On the other hand, Large Language Models (LLMs), such as GPT-4, have shown remarkable precision in generating code snippets for sketching out text inputs graphically, e.g., via TikZ. In this work, we introduce Control-GPT to guide the diffusion-based text-to-image pipelines with programmatic sketches generated by GPT-4, enhancing their abilities for instruction following. Control-GPT works by querying GPT-4 to write TikZ code, and the generated sketches are used as references alongside the text instructions for diffusion models (e.g., ControlNet) to generate photo-realistic images. One major challenge to training our pipeline is the lack of a dataset containing aligned text, images, and sketches. We address the issue by converting instance masks in existing datasets into polygons to mimic the sketches used at test time. As a result, Control-GPT greatly boosts the controllability of image generation. It establishes a new state-of-art on the spatial arrangement and object positioning generation and enhances users' control of object positions, sizes, etc., nearly doubling the accuracy of prior models. Our work, as a first attempt, shows the potential for employing LLMs to enhance the performance in computer vision tasks.

View paper on

Share this with someone who'll enjoy it:

Title:Controllable Text-to-Image Generation with GPT-4

Paper and Code