Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Interpreting and Mitigating Hallucination in MLLMs through Multi-agent Debate (2407.20505v1)

Published 30 Jul 2024 in cs.CV

Abstract: MLLMs often generate outputs that are inconsistent with the visual content, a challenge known as hallucination. Previous methods focus on determining whether a generated output is hallucinated, without identifying which image region leads to the hallucination or interpreting why such hallucinations occur. In this paper, we argue that hallucination in MLLMs is partially due to a lack of slow-thinking and divergent-thinking in these models. To address this, we propose adopting a self-reflection scheme to promote slow-thinking. Furthermore, we consider eliminating hallucination as a complex reasoning task and propose a multi-agent debate approach to encourage divergent-thinking. Consequently, our approach can not only mitigate hallucinations but also interpret why they occur and detail the specifics of hallucination. In addition, we propose to distinguish creativity from hallucination in the context of MLLMs, and illustrate how to evaluate MLLMs' creativity capability. Extensive experiments on various benchmarks demonstrate that our approach exhibits generalized hallucinations-mitigating performance across several MLLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Hallucination of multimodal large language models: A survey, 2024.
  2. Chateval: Towards better llm-based evaluators through multi-agent debate, 2023.
  3. Autoagents: A framework for automatic agent generation, 2024.
  4. Navigate through enigmatic labyrinth a survey of chain of thought reasoning: Advances, frontiers and future, 2024.
  5. Instructblip: Towards general-purpose vision-language models with instruction tuning, 2023.
  6. Detecting and preventing hallucinations in large vision language models, 2024.
  7. Measuring massive multitask language understanding, 2021.
  8. Gqa: A new dataset for real-world visual reasoning and compositional question answering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6700–6709, 2019.
  9. Hal-eval: A universal and fine-grained hallucination evaluation framework for large vision language models, 2024.
  10. A survey on large language model hallucination via a creativity perspective, 2024.
  11. Camel: Communicative agents for ”mind” exploration of large language model society, 2023.
  12. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models, 2023.
  13. Evaluating object hallucination in large vision-language models, 2023.
  14. Evaluating object hallucination in large vision-language models. arXiv preprint arXiv:2305.10355, 2023.
  15. Encouraging divergent thinking in large language models through multi-agent debate, 2024.
  16. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
  17. Improved baselines with visual instruction tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26296–26306, 2024.
  18. Asynergistic core for human brain evolution and cognition. Nature Neuroscience, 25(6):771–782, 2022.
  19. Self-refine: Iterative refinement with self-feedback, 2023.
  20. Reasoning with language model prompting: A survey, 2023.
  21. A-okvqa: A benchmark for visual question answering using world knowledge. In European conference on computer vision, pages 146–162. Springer, 2022.
  22. Reflexion: Language agents with verbal reinforcement learning, 2023.
  23. Aligning large multimodal models with factually augmented rlhf, 2023.
  24. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
  25. Vigc: Visual instruction generation and correction, 2024.
  26. Chain-of-thought prompting elicits reasoning in large language models, 2023.
  27. Evidence for a collective intelligence factor in the performance of human groups. science, 330(6004):686–688, 2010.
  28. Tree of thoughts: Deliberate problem solving with large language models, 2023.
  29. Hallucidoctor: Mitigating hallucinatory toxicity in visual instruction data, 2024.
  30. Benchmarking large language models for news summarization, 2023.
  31. Assessing and understanding creativity in large language models, 2024.
  32. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zheng Lin (104 papers)
  2. Zhenxing Niu (21 papers)
  3. Zhibin Wang (53 papers)
  4. Yinghui Xu (48 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com