Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:ARTIST: Improving the Generation of Text-rich Images by Disentanglement

Jun 17, 2024

Jianyi Zhang, Yufan Zhou, Jiuxiang Gu, Curtis Wigington, Tong Yu, Yiran Chen, Tong Sun, Ruiyi Zhang

Figure 1 for ARTIST: Improving the Generation of Text-rich Images by Disentanglement

Figure 2 for ARTIST: Improving the Generation of Text-rich Images by Disentanglement

Figure 3 for ARTIST: Improving the Generation of Text-rich Images by Disentanglement

Figure 4 for ARTIST: Improving the Generation of Text-rich Images by Disentanglement

Share this with someone who'll enjoy it:

Abstract:Diffusion models have demonstrated exceptional capabilities in generating a broad spectrum of visual content, yet their proficiency in rendering text is still limited: they often generate inaccurate characters or words that fail to blend well with the underlying image. To address these shortcomings, we introduce a new framework named ARTIST. This framework incorporates a dedicated textual diffusion model to specifically focus on the learning of text structures. Initially, we pretrain this textual model to capture the intricacies of text representation. Subsequently, we finetune a visual diffusion model, enabling it to assimilate textual structure information from the pretrained textual model. This disentangled architecture design and the training strategy significantly enhance the text rendering ability of the diffusion models for text-rich image generation. Additionally, we leverage the capabilities of pretrained large language models to better interpret user intentions, contributing to improved generation quality. Empirical results on the MARIO-Eval benchmark underscore the effectiveness of the proposed method, showing an improvement of up to 15\% in various metrics.

View paper on

Share this with someone who'll enjoy it:

Title:ARTIST: Improving the Generation of Text-rich Images by Disentanglement

Paper and Code