Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image Customization

May 28, 2024

Junjie Shentu, Matthew Watson, Noura Al Moubayed

Figure 1 for AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image Customization

Figure 2 for AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image Customization

Figure 3 for AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image Customization

Figure 4 for AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image Customization

Share this with someone who'll enjoy it:

Abstract:With the unprecedented performance being achieved by text-to-image (T2I) diffusion models, T2I customization further empowers users to tailor the diffusion model to new concepts absent in the pre-training dataset, termed subject-driven generation. Moreover, extracting several new concepts from a single image enables the model to learn multiple concepts, and simultaneously decreases the difficulties of training data preparation, urging the disentanglement of multiple concepts to be a new challenge. However, existing models for disentanglement commonly require pre-determined masks or retain background elements. To this end, we propose an attention-guided method, AttenCraft, for multiple concept disentanglement. In particular, our method leverages self-attention and cross-attention maps to create accurate masks for each concept within a single initialization step, omitting any required mask preparation by humans or other models. The created masks are then applied to guide the cross-attention activation of each target concept during training and achieve concept disentanglement. Additionally, we introduce Uniform sampling and Reweighted sampling schemes to alleviate the non-synchronicity of feature acquisition from different concepts, and improve generation quality. Our method outperforms baseline models in terms of image-alignment, and behaves comparably on text-alignment. Finally, we showcase the applicability of AttenCraft to more complicated settings, such as an input image containing three concepts. The project is available at https://github.com/junjie-shentu/AttenCraft.

View paper on

Share this with someone who'll enjoy it:

Title:AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image Customization

Paper and Code