This survey provides a comprehensive overview of the phenomenon of hallucination in LLMs, covering its definition, causes, detection methods, benchmarks, mitigation strategies, and open challenges (Huang et al., 2023 ). Hallucinations are defined as generated content that is nonsensical or unfaithful to provided source content, user instructions, or real-world facts.
Taxonomy of LLM Hallucinations
The paper proposes a refined taxonomy specifically for LLMs, moving beyond previous intrinsic/extrinsic classifications. It divides hallucinations into two main categories:
- Factuality Hallucination: Discrepancies between generated content and verifiable real-world facts.
- Factual Inconsistency: The output contains facts that contradict verifiable real-world information (e.g., stating the wrong person landed on the moon).
- Factual Fabrication: The output contains facts that are unverifiable against established real-world knowledge (e.g., creating a plausible but non-existent history for unicorns).
- Faithfulness Hallucination: Divergence of generated content from user instructions, provided context, or internal consistency.
- Instruction Inconsistency: The output deviates from the user's explicit directive (e.g., answering a question instead of translating it).
- Context Inconsistency: The output contradicts information provided in the user's input context (e.g., misstating a detail from a provided text summary).
- Logical Inconsistency: The output exhibits internal contradictions, often in reasoning tasks (e.g., correct reasoning steps leading to an incorrect final answer).
Causes of Hallucinations
Hallucinations can arise from various stages of an LLM's lifecycle:
- Data-related Causes:
- Flawed Data Source:
- Misinformation and Biases: Training data contains factual errors (leading to "imitative falsehoods"), excessive repetitions (duplication bias), or societal biases (gender, nationality) that the model learns and reproduces [DBLP:conf/acl/LinHE22, lee2021deduplicating].
- Knowledge Boundary: The training data lacks specific domain knowledge (e.g., medicine, law) or up-to-date information, causing the model to fabricate or provide outdated facts [singhal2023towards, katz2023gpt, onoe2022entity].
- Inferior Data Utilization:
- Knowledge Shortcut: Models rely on spurious correlations like co-occurrence statistics rather than true factual understanding [li2022pre, kang2023impact].
- Knowledge Recall Failures: Difficulty retrieving less frequent (long-tail) knowledge or information requiring complex, multi-step reasoning, even if present in parameters [DBLP:conf/acl/MallenAZDKH23, zheng2023does].
- Training-related Causes:
- Pre-training:
- Architecture Flaw: Limitations of the unidirectional Transformer architecture (inadequate context capture) or attention mechanism glitches [li2023batgpt, liu2023exposing].
- Suboptimal Training Objective (Exposure Bias): Discrepancy between training (using ground truth) and inference (using model's own outputs) leads to error accumulation [wang2020exposure, zhang2023language].
- Alignment (SFT & RLHF):
- Capability Misalignment: Alignment data demands knowledge beyond the model's pre-trained capabilities, forcing extrapolation [schulman2023youtube].
- Belief Misalignment (Sycophancy): Models prioritize responses favored by human evaluators/preference models over truthful ones, even if internal representations might indicate the correct fact [cotra2021why, perez2022discovering, wei2023simple, sharma2023towards].
- Inference-related Causes:
- Defective Decoding Strategy:
- Inherent Sampling Randomness: Stochastic sampling methods (like temperature sampling) needed for diversity increase the chance of selecting low-probability, potentially incorrect tokens [stahlberg2019nmt, holtzman2019curious].
- Imperfect Decoding Representation:
- Insufficient Context Attention: Models focus too locally or on generated text, losing track of the original input context or instructions, especially in long generations [miao2021prevent, DBLP:journals/corr/abs-2307-03172].
- Softmax Bottleneck: The softmax layer limits the model's ability to represent complex, multi-modal probability distributions over the vocabulary accurately [yang2017breaking, chang2022softmax].
Hallucination Detection and Benchmarks
- Detection Methods:
- Factuality Detection:
- Retrieve External Facts: Comparing LLM output against reliable external knowledge sources (web search, databases) using fact-checking pipelines [DBLP:journals/corr/abs-2305-11859, chern2023factool, DBLP:journals/corr/abs-2305-14251].
- Uncertainty Estimation: Measuring model uncertainty via internal states (token probabilities, entropy) [DBLP:journals/corr/abs-2307-03987] or observable behavior (consistency across multiple generated samples, multi-agent debate) [DBLP:journals/corr/abs-2303-08896, DBLP:journals/corr/abs-2305-13281].
- Faithfulness Detection:
- Fact-based Metrics: Measuring overlap of n-grams, entities, relations, or knowledge elements between source and generation [nan2021entity, goodrich2019assessing].
- Classifier-based Metrics: Training classifiers (often NLI models) to predict entailment or consistency between source and generation [maynez2020faithfulness, laban2022summac].
- QA-based Metrics: Generating questions based on the output and checking if they can be answered consistently by the source context [durmus2020feqa, scialom2021questeval, fabbri2021qafacteval].
- Uncertainty Estimation: Using methods like entropy or log-probability to gauge model confidence regarding faithfulness [xiao2021hallucination, miao2023selfcheck].
- Prompting-based Metrics: Using LLMs themselves as evaluators via carefully designed prompts [luo2023chatgpt, laban2023LLMs].
- Benchmarks:
- Evaluation Benchmarks: Assess the frequency/severity of hallucinations produced by LLMs (e.g., TruthfulQA [DBLP:conf/acl/LinHE22], REALTIMEQA [kasai2022realtime], HaluQA [cheng2023evaluating], FreshQA [vu2023freshLLMs]).
- Detection Benchmarks: Evaluate the performance of hallucination detection methods (e.g., HaluEval [li2023halueval], FELM [chen2023felm], BAMBOO [dong2023bamboo], PHD [yang2023new]).
Hallucination Mitigation Strategies
Mitigation strategies are often linked to the causes:
- Mitigating Data-related Hallucinations:
- Factuality Data Enhancement: Using high-quality, curated, or domain-specific data; up-sampling factual sources [gunasekar2023textbooks, touvron2023llama].
- Debiasing: Data deduplication (exact, near, semantic) [lee2021deduplicating, abbas2023semdedup] and using diverse, balanced corpora to reduce societal biases [ladhak2023pre].
- Mitigating Knowledge Boundary:
- Knowledge Editing: Modifying model parameters directly (locate-then-edit, meta-learning) or using external plug-ins to update/add facts [mitchell2022memory, meng2022locating, meng2022mass].
- Retrieval-Augmented Generation (RAG): Grounding generation on external documents retrieved via one-time, iterative, or post-hoc retrieval mechanisms [DBLP:journals/corr/abs-2302-00083, yao2022react, DBLP:conf/acl/GaoDPCCFZLLJG23].
- Mitigating Knowledge Shortcut: Debiased fine-tuning by excluding samples reinforcing spurious correlations [kang2023impact].
- Mitigating Knowledge Recall Failures: Enhancing recall via Chain-of-Thought (CoT) prompting, providing relevant knowledge clues in the prompt, or using conceptualization techniques [zheng2023does, wang2023car].
- Mitigating Training-related Hallucinations:
- Architecture Improvement: Exploring bidirectional architectures or attention regularization techniques [li2023batgpt, liu2023exposing].
- Pre-training Objective Improvement: Using factuality-enhanced objectives, context-aware pre-training on related documents, or methods to reduce exposure bias [lee2022factuality, shi2023context, wang2023progressive].
- Mitigating Alignment Hallucination (Sycophancy): Improving human preference data quality, aggregating preferences, using synthetic data interventions, or activation steering during inference [wei2023simple, sharma2023towards, rimsky2023reducing].
- Mitigating Inference-related Hallucinations:
- Factuality Enhanced Decoding: Using factual-nucleus sampling, inference-time intervention on activations (ITI), or layer contrasting (DoLa) [lee2022factuality, li2023inference, chuang2023dola]. Includes post-editing methods where LLMs self-correct based on generated verification questions (COVE) or self-reflection [dhuliawala2023chain, ji2023towards].
- Faithfulness Enhanced Decoding:
- Context Consistency: Employing confident decoding, mutual information objectives, context-aware decoding (CAD), dynamic temperature sampling based on KL divergence, or post-editing using detected inconsistencies [wan2023faithfulness, DBLP:journals/corr/abs-2305-14739, DBLP:journals/corr/abs-2306-01286]. Also includes addressing the softmax bottleneck.
- Logical Consistency: Using contrastive decoding or knowledge distillation frameworks like SCOTT to improve reasoning consistency [wang2023scott, o2023contrastive].
Challenges and Open Questions
- Challenges: Evaluating and mitigating hallucinations in long-form text generation, addressing hallucinations introduced by RAG systems (irrelevant evidence, citation errors), and tackling object/reasoning hallucinations in Large Vision-LLMs (LVLMs).
- Open Questions:
- How effective are LLM self-correction mechanisms, especially for complex reasoning?
- Can we reliably probe and capture the boundaries of an LLM's knowledge and its internal "beliefs" about truthfulness?
- How can we balance the need for factuality/truthfulness with the beneficial aspects of creativity and divergent thinking in LLMs?
The survey concludes by emphasizing that hallucination remains a significant challenge requiring ongoing research and aims to provide a foundation for developing more reliable and trustworthy AI systems.