Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models

Oct 12, 2023

Zeqiang Lai, Xizhou Zhu, Jifeng Dai, Yu Qiao, Wenhai Wang

Figure 1 for Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models

Figure 2 for Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models

Figure 3 for Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models

Figure 4 for Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models

Share this with someone who'll enjoy it:

Abstract:The revolution of artificial intelligence content generation has been rapidly accelerated with the booming text-to-image (T2I) diffusion models. Within just two years of development, it was unprecedentedly of high-quality, diversity, and creativity that the state-of-the-art models could generate. However, a prevalent limitation persists in the effective communication with these popular T2I models, such as Stable Diffusion, using natural language descriptions. This typically makes an engaging image hard to obtain without expertise in prompt engineering with complex word compositions, magic tags, and annotations. Inspired by the recently released DALLE3 - a T2I model directly built-in ChatGPT that talks human language, we revisit the existing T2I systems endeavoring to align human intent and introduce a new task - interactive text to image (iT2I), where people can interact with LLM for interleaved high-quality image generation/edit/refinement and question answering with stronger images and text correspondences using natural language. In addressing the iT2I problem, we present a simple approach that augments LLMs for iT2I with prompting techniques and off-the-shelf T2I models. We evaluate our approach for iT2I in a variety of common-used scenarios under different LLMs, e.g., ChatGPT, LLAMA, Baichuan, and InternLM. We demonstrate that our approach could be a convenient and low-cost way to introduce the iT2I ability for any existing LLMs and any text-to-image models without any training while bringing little degradation on LLMs' inherent capabilities in, e.g., question answering and code generation. We hope this work could draw broader attention and provide inspiration for boosting user experience in human-machine interactions alongside the image quality of the next-generation T2I systems.

* Technical report. Project page at https://minidalle3.github.io/

View paper on

Share this with someone who'll enjoy it:

Title:Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models

Paper and Code