Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Diffusion-based Visual Anagram as Multi-task Learning

Dec 03, 2024

Zhiyuan Xu, Yinhe Chen, Huan-ang Gao, Weiyan Zhao, Guiyu Zhang, Hao Zhao

Figure 1 for Diffusion-based Visual Anagram as Multi-task Learning

Figure 2 for Diffusion-based Visual Anagram as Multi-task Learning

Figure 3 for Diffusion-based Visual Anagram as Multi-task Learning

Figure 4 for Diffusion-based Visual Anagram as Multi-task Learning

Share this with someone who'll enjoy it:

Abstract:Visual anagrams are images that change appearance upon transformation, like flipping or rotation. With the advent of diffusion models, generating such optical illusions can be achieved by averaging noise across multiple views during the reverse denoising process. However, we observe two critical failure modes in this approach: (i) concept segregation, where concepts in different views are independently generated, which can not be considered a true anagram, and (ii) concept domination, where certain concepts overpower others. In this work, we cast the visual anagram generation problem in a multi-task learning setting, where different viewpoint prompts are analogous to different tasks,and derive denoising trajectories that align well across tasks simultaneously. At the core of our designed framework are two newly introduced techniques, where (i) an anti-segregation optimization strategy that promotes overlap in cross-attention maps between different concepts, and (ii) a noise vector balancing method that adaptively adjusts the influence of different tasks. Additionally, we observe that directly averaging noise predictions yields suboptimal performance because statistical properties may not be preserved, prompting us to derive a noise variance rectification method. Extensive qualitative and quantitative experiments demonstrate our method's superior ability to generate visual anagrams spanning diverse concepts.

* WACV 2025. Code is publicly available at https://github.com/Pixtella/Anagram-MTL

View paper on

Share this with someone who'll enjoy it:

Title:Diffusion-based Visual Anagram as Multi-task Learning

Paper and Code