SHIELDA Taxonomy for LLM Agents
- SHIELDA Taxonomy is a principled framework that classifies 36 failure modes into 12 agent artifacts across reasoning/planning and execution phases.
- It maps exceptions to specific process phases (RP, E, RP/E), enabling precise diagnostics and tailored, composable recovery strategies.
- The taxonomy underpins modular error handling in LLM agentic workflows, supporting automated escalation and system resilience.
The SHIELDA taxonomy is a principled classification of failure modes in LLM-driven agentic workflows. In LLM agentic systems—where agents autonomously reason, plan, and execute complex, multi-step tasks—workflow failures are common and occur across diverse functional components. SHIELDA systematically organizes these failures as 36 formally defined exception types grouped under 12 "agent artifacts," each aligned to specific phases of the agent's pipeline: Reasoning/Planning (RP), Execution (E), or spanning both (RP/E). This taxonomy underpins the SHIELDA exception handling framework, enabling structured, phase-aware, and composable recovery strategies in LLM agentic workflows (Zhou et al., 11 Aug 2025).
1. Agent Artifacts and Exception Typology
The SHIELDA taxonomy identifies 12 agent artifacts—core functional modules subject to exception—each associated with one or more precise exception types.
| Artifact | Representative Exceptions (count) | Phase |
|---|---|---|
| Goal | Ambiguous Goal, Conflicting Goal (2) | RP |
| Context | Context Corruption, Context Ambiguity (2) | RP |
| Reasoning | Contradictory Reasoning, Circular/Invalid Reasoning (2) | RP |
| Planning | Faulty Task Structuring, Overextended Planning (2) | RP |
| Memory | Memory Poisoning, Outdated Memory, Misaligned Memory Recall (3) | RP/E |
| Knowledge Base | Hallucinated Facts, KB Poisoning, Knowledge Conflict (3) | RP/E |
| Model | Token Limit Exceeded (RP/E), Output Validation Failure, Output Handling Exc. (3) | RP/E, E |
| Tool | Tool Invocation Exception, Tool Output Exception, Unavailable Tool (3) | E |
| Interface | API Invocation Exception, API Response Malformation, API Semantic Mismatch, | RP/E, E |
| UI Element Misclick, Text Recognition Error, UI Not Ready, Environmental Noise (7) | ||
| Task Flow | Task Dependency Exception, Error Propagation, Stopping Too Early (3) | RP/E, E |
| Other Agent | Missing Information, Communication Exception, Agent Conflict, Role Violation (4) | E |
| External System | Protocol Mismatch, External Attack (2) | E |
Each exception type is explicitly defined. For example, "Ambiguous Goal" is a failure to infer user intent from under-specified instructions, arising in RP, while "Tool Invocation Exception" refers to misformatted or misselected tool calls in E. Exception definitions include canonical examples, such as ambiguous user prompts for "Ambiguous Goal," or malformed API calls for "API Invocation Exception" (Zhou et al., 11 Aug 2025).
2. Phase Association and Exception Scope
Exception types are tightly linked to process phases, yielding a phase-aware taxonomy. Phases are designated as:
- RP (Reasoning/Planning): Cognitive steps—goal understanding, context assimilation, task decomposition, knowledge querying, and planning.
- E (Execution): Actionable steps—model output emission, tool invocation, API integration, environmental interaction.
- RP/E: Exceptions that span or propagate between reasoning and execution.
This phase granularity supports root-cause analysis: for instance, a Protocol Mismatch detected in E may have a root cause, such as Faulty Task Structuring, in RP. Partitioning of exception sets is formalized as
A classification mapping assigns each artifact to its subset of exceptions, preserving the artifact-exception-phase triad (Zhou et al., 11 Aug 2025).
3. Formal Structure and Tabular Taxonomy
The taxonomy’s core schema is a mapping between artifacts and exception sets. Its high-level table organizes each artifact with a listing of its constituent exceptions and associated phases, capturing coverage and supporting downstream procedural logic.
A textual condensation of the corresponding table is:
- Goal: Ambiguous Goal; Conflicting Goal (RP)
- Context: Context Corruption; Context Ambiguity (RP)
- Reasoning: Contradictory Reasoning; Circular/Invalid Reasoning (RP)
- Planning: Faulty Task Structuring; Overextended Planning (RP)
- Memory: Memory Poisoning; Outdated Memory; Misaligned Memory Recall (RP/E)
- Knowledge Base: Hallucinated Facts; KB Poisoning; Knowledge Conflict (RP/E)
- Model: Token Limit Exceeded (RP/E); Output Validation Failure; Output Handling Exception (E)
- Tool: Tool Invocation Exception; Tool Output Exception; Unavailable Tool (E)
- Interface: API Invocation Exception; API Response Malformation (RP/E), API Semantic Mismatch; UI Element Misclick; Text Recognition Error; UI Not Ready; Environmental Noise (E)
- Task Flow: Task Dependency Exception; Error Propagation (E); Stopping Too Early (RP/E)
- Other Agent: Missing Information; Communication Exception; Agent Conflict; Role Violation (E)
- External System: Protocol Mismatch; External Attack (E)
This structure provides fine-grained traceability between a runtime error and the artifact/process phase in which it originated, enabling systematic diagnosis and remediation (Zhou et al., 11 Aug 2025).
4. Integration with SHIELDA Exception Handling Patterns
The taxonomy directly constrains SHIELDA's exception handling by prescribing, for each exception, a small set of reusable handler patterns. Each pattern is a triplet Local Handling, Flow Control, State Recovery, selected via SHIELDA's Exception Classifier. The pattern registry spans 38 local, 3 flow, and 3 recovery tactics, combinatorially yielding 48 representative patterns.
This alignment ensures:
- Phase-aware recovery: Patterns are tailored to the artifact and phase, e.g., "Plan Repair" or "Constraint Pruning" for planning-phase exceptions, avoiding irrelevant tactics such as output sanitization for non-output failures.
- Composable and modular response: Patterns are constructed from a finite set of tactics, permitting rapid adaptation and extensibility.
- Structured escalation: If local and flow control tactics fail, escalation is triggered towards a peer agent, a human overseer, or a fallback system, ensuring no exception results in unhandled or silent failures.
Concrete mapping examples: For , only relevant patterns (e.g., plan-repair) are considered, while output-related handler patterns are excluded. This artifact-exception-pattern binding is deterministic and minimizes spurious or ineffective exception responses (Zhou et al., 11 Aug 2025).
5. Functional Outcomes and Systemic Advantages
The SHIELDA taxonomy enables three principal advances:
- End-to-end phase-aware diagnosis: Execution failures no longer terminate agent workflows; instead, they trigger automated, phase-spanning backward tracing to identify reasoning-phase root causes.
- Composable exception recovery: Exception handling is modular, covering diverse failure modes with a fixed library of handler patterns, facilitating reuse across agent architectures.
- Escalation pathways: Unrecoverable or ambiguous failures are escalated according to structured policies to peer agents or humans, thus preserving task liveness and integrity.
By systematically decomposing agentic errors into discrete artifacts and 36 exception types, the taxonomy supports robust, resilient exception handling, forming the operational backbone of SHIELDA’s runtime diagnostics and recovery for LLM-driven agentic systems (Zhou et al., 11 Aug 2025).
6. Impact and Domain Significance
The SHIELDA taxonomy establishes a standard for exception classification in LLM agentic workflows, advancing the field in traceable, phase-aware, and composable error recovery. By formalizing the artifact–exception–phase relationship, it facilitates analytical modeling, reproducibility of failure studies, and principled design of exception handling strategies for complex agentic systems. Concretely, it supports modularization of exception logic, accelerating adoption in safety-critical or reliability-focused agentic applications (Zhou et al., 11 Aug 2025).