Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fine-grained Hallucination Detection and Editing for Language Models (2401.06855v4)

Published 12 Jan 2024 in cs.CL
Fine-grained Hallucination Detection and Editing for Language Models

Abstract: LLMs (LMs) are prone to generate factual errors, which are often called hallucinations. In this paper, we introduce a comprehensive taxonomy of hallucinations and argue that hallucinations manifest in diverse forms, each requiring varying degrees of careful assessments to verify factuality. We propose a novel task of automatic fine-grained hallucination detection and construct a new evaluation benchmark, FavaBench, that includes about one thousand fine-grained human judgments on three LM outputs across various domains. Our analysis reveals that ChatGPT and Llama2-Chat (70B, 7B) exhibit diverse types of hallucinations in the majority of their outputs in information-seeking scenarios. We train FAVA, a retrieval-augmented LM by carefully creating synthetic data to detect and correct fine-grained hallucinations. On our benchmark, our automatic and human evaluations show that FAVA significantly outperforms ChatGPT and GPT-4 on fine-grained hallucination detection, and edits suggested by FAVA improve the factuality of LM-generated text.

Introduction

LLMs (LMs) have become quite adept at generating fluent and coherent language. Despite the apparent progress, these models exhibit a critical drawback: they often produce text containing factually incorrect information, known as "hallucinations." The research community has addressed the phenomenon of hallucinations by developing detection and correction mechanisms. However, these tend to be coarse-grained, simplifying the problem into binary categories of either factual or non-factual. Recognizing the limitations of current systems, researchers recently stepped forward with an innovative approach to tackle the issue of hallucinations with greater precision.

Taxonomy of Hallucinations

For a nuanced understanding of hallucinations, a novel taxonomy has been proposed, which categorizes factual errors in LM generations into six distinct types. This taxonomy is particularly of interest in scenarios where responses must be grounded in world knowledge. The six detailed categories include commonly recognized entity-level errors but also highlight underexplored areas like unverifiable statements and invented concepts. The proposed taxonomy includes contradictory entity and relation errors, entire statements that directly contradict known facts, fabrications about non-existent entities or concepts, personal biases disguised as facts, and statements that cannot be verified against world knowledge.

Fine-grained Hallucination Detection

The researchers designed a task to accompany their taxonomy—a task that centers on detecting the specific hallucination type for a given factual error in a LLM's response. To accomplish this, a fine-grained hallucination detection benchmark was constructed, featuring human-annotated responses from widely-used models across different domains. Analyses reveal that both ChatGPT and Llama2-Chat exhibit high rates of hallucinations, emphasizing the urgent need for sophisticated detection methods.

The Fava Model

To address the challenge, the researchers introduced Fava, a retrieval-augmented LLM. Unlike its predecessors, Fava is trained on synthetic data specifically designed to reflect the nuanced taxonomy. It not only detects hallucinations but also suggests corrections at a fine-grained level. Results have shown Fava to be significantly more effective at detecting and editing hallucinations compared to existing systems like ChatGPT. Despite this leap in performance, the researchers acknowledge there is still considerable room for improvement in this area of LLM development.

Fava represents a strategic advance in the endeavor to enhance the reliability and factuality of LLM outputs, marking a milestone in the ongoing development of AI-driven natural language processing. The supporting materials, including code, data, and a demonstration, have been made available for interested parties.

Discoveries in AI continue to unfold, and with them, the tools and methodologies to refine these powerful systems evolve as well. The emergence of fine-grained hallucination detection not only enhances current applications but also opens new avenues for future AI deployments in fields where factual accuracy is paramount.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  2. Self-RAG: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511.
  3. Correcting diverse factual errors in abstractive summarization via post-editing and language model infilling. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.
  4. A review on fact extraction and verification. Association for Computing Machinery Computing Surveys (CSUR).
  5. Language models are few-shot learners. In Advances in Neural Information Processing systems.
  6. Purr: Efficiently editing language model hallucinations by denoising language model corruptions. arXiv preprint arXiv:2305.14908.
  7. Factool: Factuality detection in generative ai–a tool augmented framework for multi-task and multi-domain scenarios. arXiv preprint arXiv:2307.13528.
  8. Evaluating factuality in text simplification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics.
  9. On the origin of hallucinations in conversational models: Is it the datasets or the models? In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
  10. QAFactEval: Improved QA-based factual consistency evaluation for summarization. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
  11. RARR: Researching and revising what language models say, using language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics.
  12. The WebNLG challenge: Generating text from RDF data. In Proceedings of the 10th International Conference on Natural Language Generation.
  13. Are large pre-trained language models leaking your personal information? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.
  14. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv preprint arXiv:2311.05232.
  15. Unsupervised dense information retrieval with contrastive learning. Transactions on Machine Learning Research.
  16. Survey of hallucination in natural language generation. Association for Computing Machinery Computing Surveys.
  17. Openassistant conversations–democratizing large language model alignment. arXiv preprint arXiv:2304.07327.
  18. Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics.
  19. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics.
  20. Evaluating verifiability in generative search engines. In Findings of the Association for Computational Linguistics: EMNLP 2023.
  21. Entity-based knowledge conflicts in question answering. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.
  22. When not to trust language models: Investigating effectiveness and limitations of parametric and non-parametric memories. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics.
  23. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing.
  24. FActScore: Fine-grained atomic evaluation of factual precision in long form text generation.
  25. Automated fact-checking for assisting human fact-checkers. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence Survey Track.
  26. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems.
  27. Understanding factuality in abstractive summarization with FRANK: A benchmark for factuality metrics. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
  28. The troubling emergence of hallucination in large language models - an extensive definition, quantification, and prescriptive remediations. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing.
  29. Get your vitamin C! robust fact verification with contrastive evidence. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
  30. James Thorne and Andreas Vlachos. 2018. Automated fact checking: Task formulations, methods and future directions. In Proceedings of the 27th International Conference on Computational Linguistics.
  31. FEVER: a large-scale dataset for fact extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
  32. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  33. Fact or fiction: Verifying scientific claims. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.
  34. William Yang Wang. 2017. “liar, liar pants on fire”: A new benchmark dataset for fake news detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.
  35. How far can camels go? exploring the state of instruction tuning on open resources. In Advances in Neural Information Processing Systems.
  36. Self-Instruct: Aligning language models with self-generated instructions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics.
  37. Siren’s song in the ai ocean: A survey on hallucination in large language models. arXiv preprint arXiv:2309.01219.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Abhika Mishra (1 paper)
  2. Akari Asai (35 papers)
  3. Vidhisha Balachandran (31 papers)
  4. Yizhong Wang (42 papers)
  5. Graham Neubig (342 papers)
  6. Yulia Tsvetkov (142 papers)
  7. Hannaneh Hajishirzi (176 papers)
Citations (55)
Github Logo Streamline Icon: https://streamlinehq.com