Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
96 tokens/sec
Gemini 2.5 Pro Premium
48 tokens/sec
GPT-5 Medium
15 tokens/sec
GPT-5 High Premium
23 tokens/sec
GPT-4o
104 tokens/sec
DeepSeek R1 via Azure Premium
77 tokens/sec
GPT OSS 120B via Groq Premium
466 tokens/sec
Kimi K2 via Groq Premium
201 tokens/sec
2000 character limit reached

VORD: Visual Ordinal Calibration for Mitigating Object Hallucinations in Large Vision-Language Models (2412.15739v1)

Published 20 Dec 2024 in cs.CV

Abstract: Large Vision-LLMs (LVLMs) have made remarkable developments along with the recent surge of LLMs. Despite their advancements, LVLMs have a tendency to generate plausible yet inaccurate or inconsistent information based on the provided source content. This phenomenon, also known as ``hallucinations" can have serious downstream implications during the deployment of LVLMs. To address this, we present VORD a simple and effective method that alleviates hallucinations by calibrating token predictions based on ordinal relationships between modified image pairs. VORD is presented in two forms: 1.) a minimalist training-free variant which eliminates implausible tokens from modified image pairs, and 2.) a trainable objective function that penalizes unlikely tokens. Our experiments demonstrate that VORD delivers better calibration and effectively mitigates object hallucinations on a wide-range of LVLM benchmarks.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Authors (2)