THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models (2405.05256v2)

Published 8 May 2024 in cs.CV, cs.AI, and cs.LG

Abstract: Mitigating hallucinations in large vision-LLMs (LVLMs) remains an open problem. Recent benchmarks do not address hallucinations in open-ended free-form responses, which we term "Type I hallucinations". Instead, they focus on hallucinations responding to very specific question formats -- typically a multiple-choice response regarding a particular object or attribute -- which we term "Type II hallucinations". Additionally, such benchmarks often require external API calls to models which are subject to change. In practice, we observe that a reduction in Type II hallucinations does not lead to a reduction in Type I hallucinations but rather that the two forms of hallucinations are often anti-correlated. To address this, we propose THRONE, a novel object-based automatic framework for quantitatively evaluating Type I hallucinations in LVLM free-form outputs. We use public LLMs (LMs) to identify hallucinations in LVLM responses and compute informative metrics. By evaluating a large selection of recent LVLMs using public datasets, we show that an improvement in existing metrics do not lead to a reduction in Type I hallucinations, and that established benchmarks for measuring Type I hallucinations are incomplete. Finally, we provide a simple and effective data augmentation method to reduce Type I and Type II hallucinations as a strong baseline. Code is now available at https://github.com/amazon-science/THRONE .

Citations (6)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/Prasad_Kothari/status/1789690974928507042

https://twitter.com/javaeeeee1/status/1789679048525807930

https://twitter.com/ai_arxiv/status/1788398002517709089

https://twitter.com/gm8xx8/status/1788444668113400236

https://twitter.com/javaeeeee1/status/1788532314231730518

THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models (2405.05256v2)

Summary

Related Papers

Tweets