A comprehensive taxonomy of hallucinations in Large Language Models (2508.01781v1)

Published 3 Aug 2025 in cs.CL and cs.AI

Abstract: LLMs have revolutionized natural language processing, yet their propensity for hallucination, generating plausible but factually incorrect or fabricated content, remains a critical challenge. This report provides a comprehensive taxonomy of LLM hallucinations, beginning with a formal definition and a theoretical framework that posits its inherent inevitability in computable LLMs, irrespective of architecture or training. It explores core distinctions, differentiating between intrinsic (contradicting input context) and extrinsic (inconsistent with training data or reality), as well as factuality (absolute correctness) and faithfulness (adherence to input). The report then details specific manifestations, including factual errors, contextual and logical inconsistencies, temporal disorientation, ethical violations, and task-specific hallucinations across domains like code generation and multimodal applications. It analyzes the underlying causes, categorizing them into data-related issues, model-related factors, and prompt-related influences. Furthermore, the report examines cognitive and human factors influencing hallucination perception, surveys evaluation benchmarks and metrics for detection, and outlines architectural and systemic mitigation strategies. Finally, it introduces web-based resources for monitoring LLM releases and performance. This report underscores the complex, multifaceted nature of LLM hallucinations and emphasizes that, given their theoretical inevitability, future efforts must focus on robust detection, mitigation, and continuous human oversight for responsible and reliable deployment in critical applications.

Summary

The paper introduces a formal proof that hallucinations in any computable LLM are inevitable, using computability and diagonalization arguments.
The paper proposes a dual-axis taxonomy distinguishing intrinsic/extrinsic and factuality/faithfulness hallucinations to enable standardized evaluation.
The analysis identifies data, model, and prompt factors as causes and suggests layered mitigation strategies and human oversight for safe deployment.

A Comprehensive Taxonomy of Hallucinations in LLMs

Introduction and Motivation

The paper "A comprehensive taxonomy of hallucinations in LLMs" (2508.01781) presents a systematic and theoretically grounded analysis of hallucinations in LLMs, addressing both their formal inevitability and their diverse empirical manifestations. The work synthesizes formal definitions, taxonomic frameworks, empirical typologies, causal analyses, evaluation methodologies, and mitigation strategies, providing a unified reference for researchers and practitioners concerned with the reliability and safety of LLM deployments.

Formal Definition and Theoretical Inevitability

A central contribution is the formalization of hallucination as an inconsistency between a computable LLM $h$ and a computable ground truth function $f$ , within a formal world $G_f = \{(s, f(s)) | s \in S\}$ . The paper leverages computability theory, specifically diagonalization arguments, to prove that for any computable LLM, there exists a computable $f$ such that the LLM will hallucinate on at least one, and in fact infinitely many, inputs. This result is architecture-agnostic and holds regardless of training data, learning algorithm, or prompting strategy. The corollary is that no computable LLM can self-eliminate hallucination, and thus, hallucination is not a removable artifact but an intrinsic limitation of the LLM paradigm.

This theoretical inevitability has direct implications for deployment: LLMs cannot be trusted as autonomous agents in safety-critical or high-stakes domains without external verification, guardrails, or human oversight.

Core Taxonomies: Intrinsic/Extrinsic and Factuality/Faithfulness

The paper delineates two orthogonal axes for classifying hallucinations:

Intrinsic vs. Extrinsic: Intrinsic hallucinations contradict the provided input or context (e.g., internal logical inconsistencies or misrepresentation of source material), while extrinsic hallucinations introduce content unsupported or refuted by the input or training data (e.g., fabricated entities or events).
Factuality vs. Faithfulness: Factuality hallucinations are factually incorrect with respect to external reality or knowledge bases, whereas faithfulness hallucinations diverge from the input prompt or context, regardless of external truth.

These axes are not mutually exclusive and often overlap in real-world cases. The lack of a unified taxonomy in the literature is identified as a barrier to standardized evaluation and mitigation.

Specific Manifestations and Task-Specific Hallucinations

The taxonomy is further refined into concrete categories, including:

Factual errors and fabrications: Incorrect facts, invented entities, or fabricated citations.
Contextual inconsistencies: Contradictions or unsupported additions relative to the input.
Instruction deviation: Failure to follow explicit user instructions.
Logical inconsistencies: Internal contradictions or reasoning errors.
Temporal disorientation: Outdated or anachronistic information.
Ethical violations: Harmful, defamatory, or legally incorrect outputs.
Amalgamated hallucinations: Erroneous blending of multiple facts.
Nonsensical responses: Irrelevant or incoherent outputs.
Task-specific: Hallucinations in code generation, multimodal reasoning, dialogue, summarization, and QA.

The diversity of these manifestations underscores the need for domain- and task-specific detection and mitigation strategies.

Underlying Causes: Data, Model, and Prompt Factors

The paper provides a granular analysis of the root causes of hallucination:

Data-related: Noisy, biased, or outdated training data; insufficient representation; source-reference divergence.
Model-related: Auto-regressive generation, exposure bias, capability and belief misalignment, over-optimization, stochastic decoding, overconfidence, generalization failure, reasoning limitations, knowledge overshadowing, and extraction failures.
Prompt-related: Adversarial attacks, confirmatory bias, and poor prompting.

The emergent property of hallucination is attributed to the statistical, auto-regressive nature of LLMs, which optimize for plausible token sequences rather than factual or logical correctness.

Human Factors and Cognitive Biases

The perception and impact of hallucinations are modulated by human factors:

User trust and interpretability: Fluency and confidence in LLM outputs can mislead users into overtrusting hallucinated content.
Cognitive biases: Automation bias, confirmation bias, and the illusion of explanatory depth increase susceptibility to hallucinations, even when users are warned of potential errors.
Design implications: Calibrated uncertainty displays, source-grounding indicators, justification prompts, and factuality-aware interfaces are recommended to enhance user resilience and support human-in-the-loop oversight.

Evaluation Benchmarks and Metrics

The paper surveys principal benchmarks and metrics for hallucination detection:

Benchmarks: TruthfulQA, HalluLens, FActScore, Q2, QuestEval, and domain-specific datasets (e.g., MedHallu, CodeHaluEval, HALLUCINOGEN).
Metrics: ROUGE, BLEU, BERTScore (surface/semantic similarity); FactCC, SummaC (entailment/NLI-based); KILT, RAE (knowledge-grounded); and human evaluation (correctness, faithfulness, coherence, harmfulness).
Limitations: Lack of standardization, task dependence, insensitivity to subtle hallucinations, and limited explainability.

The need for unified, taxonomy-aware, and context-sensitive evaluation frameworks is emphasized.

Mitigation Strategies: Architectural and Systemic

Mitigation is addressed at both the model and system levels:

Architectural: Toolformer-style augmentation (external tool/API calls), retrieval-augmented generation (RAG), fine-tuning with synthetic or adversarially filtered data.
Systemic: Guardrails (logic validators, factual filters, rule-based fallbacks), symbolic integration, and hybrid context-aware systems.

No single technique suffices; layered, hybrid approaches tailored to application context are advocated.

Monitoring and Benchmarking Resources

The paper highlights web-based resources for tracking LLM performance and hallucination trends:

Artificial Analysis: Intelligence, cost, latency, and multimodal benchmarks.
Figure 1: Sample visualization of the AI Index, retrieved on 28 June 2025.

Figure 2: Sample visualization of intelligence versus price, retrieved on 28 June 2025.

Figure 3: Sample visualization of latency, retrieved on 28 June 2025.
Vectara Hallucination Leaderboard: Explicit tracking of hallucination rates in summarization.
Figure 4: Sample visualization of grounded hallucinations rate using Hughes hallucination evaluation model, retrieved on 29 June 2025.
Epoch AI Benchmarking Dashboard: Longitudinal trends in accuracy, compute, and open/proprietary model performance.
Figure 5: Sample visualization of accuracy versus training compute, retrieved on 29 June 2025.

Figure 6: Sample visualization of models with downloadable weights vs proprietary, retrieved on 29 June 2025.

Figure 7: Sample visualization of US models vs non-US, retrieved on 29 June 2025.

Figure 8: Sample visualization of models performance on expert-level mathematics problems, retrieved on 29 June 2025.
LM Arena: Community-driven, real-world preference-based evaluation of model helpfulness and trustworthiness.

Figure 9: Sample visualization of models performance on text generation, retrieved on 9 July 2025.

Figure 10: Sample visualization of models performance on generative AI models capable of understanding and processing visual inputs, retrieved on 9 July 2025.

These resources facilitate transparent, reproducible, and up-to-date monitoring of LLM capabilities and hallucination risks.

Conclusion

The inevitability and multifaceted nature of hallucinations in LLMs, as rigorously established in this work, necessitate a paradigm shift from elimination to robust detection, mitigation, and human-centered oversight. The formal proofs of inevitability, combined with the detailed empirical taxonomy and causal analysis, provide a foundation for future research on reliable LLM deployment. Progress will depend on the development of unified taxonomies, context-aware evaluation frameworks, hybrid mitigation architectures, and user-centric interface designs. In high-stakes domains, continuous human-in-the-loop validation and external safeguards are essential. The paper's synthesis of theoretical, empirical, and practical perspectives offers a comprehensive roadmap for advancing the safety and trustworthiness of LLMs in real-world applications.