Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models

Dec 02, 2024

Zeyi Sun, Ziyang Chu, Pan Zhang, Tong Wu, Xiaoyi Dong, Yuhang Zang, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

Figure 1 for X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models

Figure 2 for X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models

Figure 3 for X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models

Figure 4 for X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models

Share this with someone who'll enjoy it:

Abstract:In-context generation is a key component of large language models' (LLMs) open-task generalization capability. By leveraging a few examples as context, LLMs can perform both in-domain and out-of-domain tasks. Recent advancements in auto-regressive vision-language models (VLMs) built upon LLMs have showcased impressive performance in text-to-image generation. However, the potential of in-context learning for general image generation tasks remains largely unexplored. To address this, we introduce X-Prompt, a purely auto-regressive large-vision language model designed to deliver competitive performance across a wide range of both seen and unseen image generation tasks, all within a unified in-context learning framework. X-Prompt incorporates a specialized design that efficiently compresses valuable features from in-context examples, supporting longer in-context token sequences and improving its ability to generalize to unseen tasks. A unified training task for both text and image prediction enables X-Prompt to handle general image generation with enhanced task awareness from in-context examples. Extensive experiments validate the model's performance across diverse seen image generation tasks and its capacity to generalize to previously unseen tasks.

* code: https://github.com/SunzeY/X-Prompt

View paper on

Share this with someone who'll enjoy it:

Title:X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models

Paper and Code