Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Muyuan Fang

StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models

Sep 04, 2024

Wen Li, Muyuan Fang, Cheng Zou, Biao Gong, Ruobing Zheng, Meng Wang, Jingdong Chen, Ming Yang

Figure 1 for StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models

Figure 2 for StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models

Figure 3 for StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models

Figure 4 for StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models

Abstract:Despite the burst of innovative methods for controlling the diffusion process, effectively controlling image styles in text-to-image generation remains a challenging task. Many adapter-based methods impose image representation conditions on the denoising process to accomplish image control. However these conditions are not aligned with the word embedding space, leading to interference between image and text control conditions and the potential loss of semantic information from the text prompt. Addressing this issue involves two key challenges. Firstly, how to inject the style representation without compromising the effectiveness of text representation in control. Secondly, how to obtain the accurate style representation from a single reference image. To tackle these challenges, we introduce StyleTokenizer, a zero-shot style control image generation method that aligns style representation with text representation using a style tokenizer. This alignment effectively minimizes the impact on the effectiveness of text prompts. Furthermore, we collect a well-labeled style dataset named Style30k to train a style feature extractor capable of accurately representing style while excluding other content information. Experimental results demonstrate that our method fully grasps the style characteristics of the reference image, generating appealing images that are consistent with both the target image style and text prompt. The code and dataset are available at https://github.com/alipay/style-tokenizer.

* Accepted by ECCV2024

Via

Access Paper or Ask Questions

BETANAS: BalancEd TrAining and selective drop for Neural Architecture Search

Dec 24, 2019

Muyuan Fang, Qiang Wang, Zhao Zhong

Figure 1 for BETANAS: BalancEd TrAining and selective drop for Neural Architecture Search

Figure 2 for BETANAS: BalancEd TrAining and selective drop for Neural Architecture Search

Figure 3 for BETANAS: BalancEd TrAining and selective drop for Neural Architecture Search

Figure 4 for BETANAS: BalancEd TrAining and selective drop for Neural Architecture Search

Abstract:Automatic neural architecture search techniques are becoming increasingly important in machine learning area. Especially, weight sharing methods have shown remarkable potentials on searching good network architectures with few computational resources. However, existing weight sharing methods mainly suffer limitations on searching strategies: these methods either uniformly train all network paths to convergence which introduces conflicts between branches and wastes a large amount of computation on unpromising candidates, or selectively train branches with different frequency which leads to unfair evaluation and comparison among paths. To address these issues, we propose a novel neural architecture search method with balanced training strategy to ensure fair comparisons and a selective drop mechanism to reduce conflicts among candidate paths. The experimental results show that our proposed method can achieve a leading performance of 79.0% on ImageNet under mobile settings, which outperforms other state-of-the-art methods in both accuracy and efficiency.

Via

Access Paper or Ask Questions