Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models

Jun 17, 2024

Fanqing Meng, Wenqi Shao, Lixin Luo, Yahong Wang, Yiran Chen, Quanfeng Lu, Yue Yang, Tianshuo Yang, Kaipeng Zhang, Yu Qiao(+1 more)

Figure 1 for PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models

Figure 2 for PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models

Figure 3 for PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models

Figure 4 for PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models

Share this with someone who'll enjoy it:

Abstract:Text-to-image (T2I) models have made substantial progress in generating images from textual prompts. However, they frequently fail to produce images consistent with physical commonsense, a vital capability for applications in world simulation and everyday tasks. Current T2I evaluation benchmarks focus on metrics such as accuracy, bias, and safety, neglecting the evaluation of models' internal knowledge, particularly physical commonsense. To address this issue, we introduce PhyBench, a comprehensive T2I evaluation dataset comprising 700 prompts across 4 primary categories: mechanics, optics, thermodynamics, and material properties, encompassing 31 distinct physical scenarios. We assess 6 prominent T2I models, including proprietary models DALLE3 and Gemini, and demonstrate that incorporating physical principles into prompts enhances the models' ability to generate physically accurate images. Our findings reveal that: (1) even advanced models frequently err in various physical scenarios, except for optics; (2) GPT-4o, with item-specific scoring instructions, effectively evaluates the models' understanding of physical commonsense, closely aligning with human assessments; and (3) current T2I models are primarily focused on text-to-image translation, lacking profound reasoning regarding physical commonsense. We advocate for increased attention to the inherent knowledge within T2I models, beyond their utility as mere image generation tools. The code and data are available at https://github.com/OpenGVLab/PhyBench.

View paper on

Share this with someone who'll enjoy it:

Title:PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models

Paper and Code