Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Henglei Lv

Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization

Jan 30, 2024

Henglei Lv, Jiayu Xiao, Liang Li, Qingming Huang

Abstract:Diffusion-based text-to-image personalization have achieved great success in generating subjects specified by users among various contexts. Even though, existing finetuning-based methods still suffer from model overfitting, which greatly harms the generative diversity, especially when given subject images are few. To this end, we propose Pick-and-Draw, a training-free semantic guidance approach to boost identity consistency and generative diversity for personalization methods. Our approach consists of two components: appearance picking guidance and layout drawing guidance. As for the former, we construct an appearance palette with visual features from the reference image, where we pick local patterns for generating the specified subject with consistent identity. As for layout drawing, we outline the subject's contour by referring to a generative template from the vanilla diffusion model, and inherit the strong image prior to synthesize diverse contexts according to different text conditions. The proposed approach can be applied to any personalized diffusion models and requires as few as a single reference image. Qualitative and quantitative experiments show that Pick-and-Draw consistently improves identity consistency and generative diversity, pushing the trade-off between subject fidelity and image-text fidelity to a new Pareto frontier.

Via

Access Paper or Ask Questions

R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation

Oct 26, 2023

Jiayu Xiao, Liang Li, Henglei Lv, Shuhui Wang, Qingming Huang

Figure 1 for R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation

Figure 2 for R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation

Figure 3 for R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation

Figure 4 for R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation

Abstract:Recent text-to-image (T2I) diffusion models have achieved remarkable progress in generating high-quality images given text-prompts as input. However, these models fail to convey appropriate spatial composition specified by a layout instruction. In this work, we probe into zero-shot grounded T2I generation with diffusion models, that is, generating images corresponding to the input layout information without training auxiliary modules or finetuning diffusion models. We propose a Region and Boundary (R&B) aware cross-attention guidance approach that gradually modulates the attention maps of diffusion model during generative process, and assists the model to synthesize images (1) with high fidelity, (2) highly compatible with textual input, and (3) interpreting layout instructions accurately. Specifically, we leverage the discrete sampling to bridge the gap between consecutive attention maps and discrete layout constraints, and design a region-aware loss to refine the generative layout during diffusion process. We further propose a boundary-aware loss to strengthen object discriminability within the corresponding regions. Experimental results show that our method outperforms existing state-of-the-art zero-shot grounded T2I generation methods by a large margin both qualitatively and quantitatively on several benchmarks.

* Preprint. Under review. Project page: https://sagileo.github.io/Region-and-Boundary

Via

Access Paper or Ask Questions