Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites (2312.01701v1)

Published 4 Dec 2023 in cs.CV and cs.CL

Abstract: LLMs have shown remarkable performance in NLP tasks. To comprehend and execute diverse human instructions over image data, instruction-tuned large vision-LLMs (LVLMs) have been introduced. However, LVLMs may suffer from different types of object hallucinations. Nevertheless, LVLMs are evaluated for coarse-grained object hallucinations only (i.e., generated objects non-existent in the input image). The fine-grained object attributes and behaviors non-existent in the image may still be generated but not measured by the current evaluation methods. In this paper, we thus focus on reducing fine-grained hallucinations of LVLMs. We propose \textit{ReCaption}, a framework that consists of two components: rewriting captions using ChatGPT and fine-tuning the instruction-tuned LVLMs on the rewritten captions. We also propose a fine-grained probing-based evaluation method named \textit{Fine-Grained Object Hallucination Evaluation} (\textit{FGHE}). Our experiment results demonstrate that ReCaption effectively reduces fine-grained object hallucination for different LVLM options and improves their text generation quality. The code can be found at https://github.com/Anonymousanoy/FOHE.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Lei Wang (975 papers)
  2. Jiabang He (6 papers)
  3. Shenshen Li (2 papers)
  4. Ning Liu (199 papers)
  5. Ee-Peng Lim (57 papers)
Citations (28)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub