Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation (2310.18235v4)

Published 27 Oct 2023 in cs.CV, cs.AI, cs.CL, and cs.LG

Abstract: Evaluating text-to-image models is notoriously difficult. A strong recent approach for assessing text-image faithfulness is based on QG/A (question generation and answering), which uses pre-trained foundational models to automatically generate a set of questions and answers from the prompt, and output images are scored based on whether these answers extracted with a visual question answering model are consistent with the prompt-based answers. This kind of evaluation is naturally dependent on the quality of the underlying QG and VQA models. We identify and address several reliability challenges in existing QG/A work: (a) QG questions should respect the prompt (avoiding hallucinations, duplications, and omissions) and (b) VQA answers should be consistent (not asserting that there is no motorcycle in an image while also claiming the motorcycle is blue). We address these issues with Davidsonian Scene Graph (DSG), an empirically grounded evaluation framework inspired by formal semantics, which is adaptable to any QG/A frameworks. DSG produces atomic and unique questions organized in dependency graphs, which (i) ensure appropriate semantic coverage and (ii) sidestep inconsistent answers. With extensive experimentation and human evaluation on a range of model configurations (LLM, VQA, and T2I), we empirically demonstrate that DSG addresses the challenges noted above. Finally, we present DSG-1k, an open-sourced evaluation benchmark that includes 1,060 prompts, covering a wide range of fine-grained semantic categories with a balanced distribution. We release the DSG-1k prompts and the corresponding DSG questions.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Jaemin Cho (36 papers)
  2. Yushi Hu (23 papers)
  3. Roopal Garg (7 papers)
  4. Peter Anderson (30 papers)
  5. Ranjay Krishna (116 papers)
  6. Jason Baldridge (45 papers)
  7. Mohit Bansal (304 papers)
  8. Jordi Pont-Tuset (38 papers)
  9. Su Wang (66 papers)
Citations (51)
X Twitter Logo Streamline Icon: https://streamlinehq.com