Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective

Oct 14, 2024

Xiangru Zhu, Penglei Sun, Yaoxian Song, Yanghua Xiao, Zhixu Li, Chengyu Wang, Jun Huang, Bei Yang, Xiaoxiao Xu

Figure 1 for Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective

Figure 2 for Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective

Figure 3 for Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective

Figure 4 for Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective

Share this with someone who'll enjoy it:

Abstract:Accurate interpretation and visualization of human instructions are crucial for text-to-image (T2I) synthesis. However, current models struggle to capture semantic variations from word order changes, and existing evaluations, relying on indirect metrics like text-image similarity, fail to reliably assess these challenges. This often obscures poor performance on complex or uncommon linguistic patterns by the focus on frequent word combinations. To address these deficiencies, we propose a novel metric called SemVarEffect and a benchmark named SemVarBench, designed to evaluate the causality between semantic variations in inputs and outputs in T2I synthesis. Semantic variations are achieved through two types of linguistic permutations, while avoiding easily predictable literal variations. Experiments reveal that the CogView-3-Plus and Ideogram 2 performed the best, achieving a score of 0.2/1. Semantic variations in object relations are less understood than attributes, scoring 0.07/1 compared to 0.17-0.19/1. We found that cross-modal alignment in UNet or Transformers plays a crucial role in handling semantic variations, a factor previously overlooked by a focus on textual encoders. Our work establishes an effective evaluation framework that advances the T2I synthesis community's exploration of human instruction understanding.

* Our benchmark and code are available at https://github.com/zhuxiangru/SemVarBench

View paper on

Share this with someone who'll enjoy it:

Title:Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective

Paper and Code