Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Compositional Text-to-Image Generation with Dense Blob Representations

May 14, 2024

Weili Nie, Sifei Liu, Morteza Mardani, Chao Liu, Benjamin Eckart, Arash Vahdat

Figure 1 for Compositional Text-to-Image Generation with Dense Blob Representations

Figure 2 for Compositional Text-to-Image Generation with Dense Blob Representations

Figure 3 for Compositional Text-to-Image Generation with Dense Blob Representations

Figure 4 for Compositional Text-to-Image Generation with Dense Blob Representations

Share this with someone who'll enjoy it:

Abstract:Existing text-to-image models struggle to follow complex text prompts, raising the need for extra grounding inputs for better controllability. In this work, we propose to decompose a scene into visual primitives - denoted as dense blob representations - that contain fine-grained details of the scene while being modular, human-interpretable, and easy-to-construct. Based on blob representations, we develop a blob-grounded text-to-image diffusion model, termed BlobGEN, for compositional generation. Particularly, we introduce a new masked cross-attention module to disentangle the fusion between blob representations and visual features. To leverage the compositionality of large language models (LLMs), we introduce a new in-context learning approach to generate blob representations from text prompts. Our extensive experiments show that BlobGEN achieves superior zero-shot generation quality and better layout-guided controllability on MS-COCO. When augmented by LLMs, our method exhibits superior numerical and spatial correctness on compositional image generation benchmarks. Project page: https://blobgen-2d.github.io.

* ICML 2024

View paper on

Share this with someone who'll enjoy it:

Title:Compositional Text-to-Image Generation with Dense Blob Representations

Paper and Code