PIES Taxonomy of Hallucinations

Updated 21 March 2026

PIES Taxonomy is a multi-dimensional framework that systematically defines and categorizes LLM hallucinations using formal computability-theoretic principles.
It distinguishes errors by mapping intrinsic versus extrinsic and explicit versus implicit dimensions across planning and summarization stages.
The framework informs practical mitigation strategies, enhanced diagnostics, and safer deployment in critical, high-stakes language model applications.

A hallucination in the context of LLMs is a model output that is plausible in form but factually incorrect, fabricated, inconsistent with real or provided input, or otherwise unsupported by sources. The PIES taxonomy of hallucinations establishes a fine-grained, multi-dimensional framework for classifying, evaluating, and analyzing LLM outputs in research and agentic contexts, building upon a rigorous formal and theoretical foundation. The taxonomy underpins current research diagnostics and the design of mitigation strategies for reliable, safe LLM deployment (Cossio, 3 Aug 2025, Bang et al., 24 Apr 2025, Zhan et al., 30 Jan 2026).

1. Formal Foundations and Theoretical Inevitability

The PIES taxonomy adopts a computability-theoretic formalism to define and characterize hallucinations. Let $S$ be the set of finite input strings, and $f: S \to \Sigma^*$ the ground-truth computable function. The “formal world” of correct answers is $G_f = \{ (s, f(s)) \mid s \in S \}$ . An LLM is represented as a computable function $h[i]: S \to \Sigma^*$ at training stage $i$ . A hallucination occurs when

$\forall\,i\in\mathbb{N},\;\exists\,s\in S\quad h[i](s)\;\neq\;f(s)$

Diagonalization arguments in computability theory show that no computable LLM (of any training protocol or architecture) can eliminate hallucinations for all computable $f$ . Key results include:

Any computably enumerable family of polynomial-time LLMs will inevitably hallucinate on at least one input [(Cossio, 3 Aug 2025), Thm 1].
There exists $f$ such that such families hallucinate on infinitely many inputs [(Cossio, 3 Aug 2025), Thm 2].
Self-correction strategies or internal mechanisms (e.g., chain-of-thought prompting) cannot eliminate hallucinations globally [(Cossio, 3 Aug 2025), Cor. 1].

Practical implications are that LLMs cannot be fully autonomous, reliable decision-makers for mathematically or safety-critical domains without external grounding or continuous human oversight.

2. Core Taxonomic Axes: PIES, Intrinsic/Extrinsic, Factuality/Faithfulness

The PIES taxonomy provides a detailed error decomposition along two fundamental axes: functional stage (planning vs. summarization) and error property (explicit vs. implicit).

PIES Category	Functional Component	Error Property	Definition/Example
Perception (P)	Input Parsing	-	Input mis-reads (e.g. OCR, ASR errors; multimodal input errors)
Interpretation (I)	Summarization	Usually Explicit	Contradiction of input context ("intrinsic" error)
Extraction (E)	Summarization	Explicit	Claim unsupported by training data ("extrinsic" error)
Synthesis (S)	Summarization	Explicit	Novel fabrication combining real fragments

Core dimensions—intrinsic vs. extrinsic and factuality vs. faithfulness—are defined as follows (Cossio, 3 Aug 2025, Bang et al., 24 Apr 2025):

Intrinsic hallucination: Contradicts the input context; can be refuted by inspecting the input alone.
Extrinsic hallucination: Is neither present in input nor verifiable in reality/training data; pure fabrication.
Factuality hallucination: Contradicts established external knowledge.
Faithfulness hallucination: Diverges from provided input or instructions, even if factually correct in isolation.

PIES Interpretation aligns with intrinsic hallucinations, while Extraction and Synthesis represent types of extrinsic hallucinations (Synthesis viewed as a special case of Extraction). Perception hallucinations, involving input mis-parsing, lie outside the purview of most NLP LLM studies but arise in multimodal domains.

3. PIES Four-Class Error Framework in Research Agents

The operationalization of the PIES taxonomy in agentic research trajectories precisely quantifies functional, stepwise hallucination error types (Zhan et al., 30 Jan 2026). The framework cross-classifies errors as:

Explicit Planning (Action Hallucination): Flawed planning steps. Sub-types:
- Action Deviation: Diverges from user intent
- Action Redundancy: Repeats completed efforts
- Action Propagation: Cascades error from previous hallucinated state

$H^{EP} = \frac{|A_\text{deviation}| + |A_\text{redundancy}| + |A_\text{propagation}|}{|A_\text{total}|}$

Implicit Planning (Restriction Neglect): Silent omission of user restrictions; agent executes valid steps but ignores explicit constraints.

$H^{IP} = \frac{|Q_\text{total}| - |Q_\text{executed}|}{|Q_\text{total}|}$

Explicit Summarization (Claim Hallucination): Fabricated or misattributed claims in outputs; lacks support in retrieved evidence or mis-cites sources.

$f: S \to \Sigma^*$ 0

Implicit Summarization (Noise Domination): Retrieved evidence exists, but is ignored; summary omits essential information by over-relying on “noisy” clusters.

$f: S \to \Sigma^*$ 1

The composite hallucination score per trajectory is $f: S \to \Sigma^*$ 2.

Functional segmentation, atomic decomposition of research steps, and mapping hallucinations to a DAG allow process-aware tracing of propagation, error origination, and late-stage collapse.

4. Specific Manifestations and Typology Mapping

Hallucinations present in diverse, domain-specific forms, each amenable to the PIES and allied frameworks. Examples (Cossio, 3 Aug 2025):

Factual errors: Incorrect or fabricated facts (“Great Wall is visible from space”)
Contextual/Instruction inconsistency: Contradicts or ignores input context/instructions
Logical inconsistencies: Self-contradiction, arithmetic error
Temporal disorientation: Anachronistic/outdated output (“Murakami won Nobel in 2016”)
Ethical violations: Defamation, misinformation
Task-specific errors: Code bugs, dialogue context confusion, multimodal object hallucination
Amalgamated hallucinations: Incorrect merging of multiple true facts
Nonsensical responses: Outputs that are irrelevant, semantically empty

These manifestations can be mapped in the PIES four-way schema (e.g., claim fabrications to explicit summarization errors; instruction neglect to implicit planning errors), and also classified as intrinsic/extrinsic or faithfulness/factuality violations for alignment with other existing taxonomies (Bang et al., 24 Apr 2025).

5. Causal Analysis: Hallucination Etiology and Propagation

Underlying causes of hallucination span three principal groups (Cossio, 3 Aug 2025):

Data-related: Source noise or bias, reference divergence, outdated coverage
Model-related: Next-token sampling mismatches, unidirectional attention, over-optimization towards proxy objectives, exposure bias, overconfidence, weak logical reasoning
Prompt-related: Ambiguity, adversarial phrasing, confirmatory bias

Propagation within process-aware agent trajectories (as in research agents) is now quantitatively analyzed via causal DAGs. Initial hallucinations (often propagated from planning stage) trigger long chains of downstream errors (“domino effect”). Experimental results show early-stage cascading in proprietary systems (OpenAI, Gemini) and late-stage collapses in open-source ones (Salesforce) (Zhan et al., 30 Jan 2026).

Cognitive and semantic annotation studies reveal:

Anchoring bias: Overweighting of initial retrieval outputs
Homogeneity bias: Preference for large/redundant clusters, leading to missed singleton insights and exacerbation of noise domination

6. Evaluation Benchmarks and Measurement Metrics

Benchmarking and detection metrics reflect the taxonomy’s dimensions and support fine-grained system comparison (Cossio, 3 Aug 2025, Bang et al., 24 Apr 2025, Zhan et al., 30 Jan 2026). Notable resources include:

TruthfulQA: Open-domain factual QA
HalluLens: Taxonomy-linked, multi-dimensional benchmark
FActScore, Q2/QuestEval: Faithfulness and question-generation/answering
Domain-specific: MedHallu, MedHallBench, Med-HALT, CodeHaluEval, HALLUCINOGEN

Metrics are task- and taxonomy-aligned:

Faithfulness to input: ROUGE, BLEU, BERTScore, SummaC (NLI-based), FactCC (classifier)
Factuality/external truth: Knowledge base linking (KILT), Retrieval-Augmented Evaluation (RAE)
Human annotation: Assesses correctness, faithfulness, coherence, and harmfulness

Agentic metrics in DeepHalluBench include explicit rates ( $f: S \to \Sigma^*$ 3), implicit penalties ( $f: S \to \Sigma^*$ 4), and composite trajectory scores. Persistent challenges are the lack of universal definitions, sensitivity to subtle error types, and explainability of automatic measures.

7. Mitigation Strategies and Future Research Trajectories

Given the inevitability of hallucination, research focus has shifted to robust detection, targeted mitigation, and comprehensive oversight. Strategies include (Cossio, 3 Aug 2025):

Architectural:
- Tool use: On-demand calculator/API calls (Toolformer)
- Retrieval-Augmented Generation (RAG): Citation-constrained, passage-grounded generation
- Fine-tuning on filtered/synthetic corpora: Adversarial and factuality-focused datasets
Systemic:
- Guardrails and symbolic integration: Logic checks, database filters, rule-based fallbacks
- Uncertainty-aware interfaces: Confidence, provenance display
- Human-in-the-loop: Required validation in critical workflows

Key recommendations and open questions:

Develop unified, taxonomy-aware benchmarks for apples-to-apples comparison.
Advance neuro-symbolic architectures for stronger reasoning and logical grounding.
Design interfaces that expose uncertainty and provenance to end users.
Research the quantification of LLMs’ irreducible hallucination floor and the trade-off between creativity and factuality in various application domains (Cossio, 3 Aug 2025).

A plausible implication is that hybrid, context-aware, and human-in-the-loop systems will become the de facto strategy for high-stakes LLM deployment. Quantitative understanding of hallucination propagation, bias, and error correction constitutes a central research frontier.

Markdown Report Issue Upgrade to Chat

References (3)

A comprehensive taxonomy of hallucinations in Large Language Models (2025)

HalluLens: LLM Hallucination Benchmark (2025)

Why Your Deep Research Agent Fails? On Hallucination Evaluation in Full Research Trajectory (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PIES Taxonomy of Hallucinations.

PIES Taxonomy of Hallucinations

1. Formal Foundations and Theoretical Inevitability

2. Core Taxonomic Axes: PIES, Intrinsic/Extrinsic, Factuality/Faithfulness

3. PIES Four-Class Error Framework in Research Agents

4. Specific Manifestations and Typology Mapping

5. Causal Analysis: Hallucination Etiology and Propagation

6. Evaluation Benchmarks and Measurement Metrics

7. Mitigation Strategies and Future Research Trajectories

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

PIES Taxonomy of Hallucinations

1. Formal Foundations and Theoretical Inevitability

2. Core Taxonomic Axes: PIES, Intrinsic/Extrinsic, Factuality/Faithfulness

3. PIES Four-Class Error Framework in Research Agents

4. Specific Manifestations and Typology Mapping

5. Causal Analysis: Hallucination Etiology and Propagation

6. Evaluation Benchmarks and Measurement Metrics

7. Mitigation Strategies and Future Research Trajectories

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research