Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models (2406.11802v3)

Published 17 Jun 2024 in cs.CV

Abstract: Text-to-image (T2I) models have made substantial progress in generating images from textual prompts. However, they frequently fail to produce images consistent with physical commonsense, a vital capability for applications in world simulation and everyday tasks. Current T2I evaluation benchmarks focus on metrics such as accuracy, bias, and safety, neglecting the evaluation of models' internal knowledge, particularly physical commonsense. To address this issue, we introduce PhyBench, a comprehensive T2I evaluation dataset comprising 700 prompts across 4 primary categories: mechanics, optics, thermodynamics, and material properties, encompassing 31 distinct physical scenarios. We assess 6 prominent T2I models, including proprietary models DALLE3 and Gemini, and demonstrate that incorporating physical principles into prompts enhances the models' ability to generate physically accurate images. Our findings reveal that: (1) even advanced models frequently err in various physical scenarios, except for optics; (2) GPT-4o, with item-specific scoring instructions, effectively evaluates the models' understanding of physical commonsense, closely aligning with human assessments; and (3) current T2I models are primarily focused on text-to-image translation, lacking profound reasoning regarding physical commonsense. We advocate for increased attention to the inherent knowledge within T2I models, beyond their utility as mere image generation tools. The data will be available soon.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Fanqing Meng (14 papers)
  2. Wenqi Shao (89 papers)
  3. Lixin Luo (1 paper)
  4. Yahong Wang (1 paper)
  5. Yiran Chen (176 papers)
  6. Quanfeng Lu (10 papers)
  7. Yue Yang (146 papers)
  8. Tianshuo Yang (8 papers)
  9. Kaipeng Zhang (73 papers)
  10. Yu Qiao (563 papers)
  11. Ping Luo (340 papers)
Citations (3)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub