SHIELDA: Exception Handling in LLM Workflows
- The paper demonstrates a modular framework that classifies 36 distinct exception types and orchestrates structured recovery patterns across agentic workflow artifacts.
- Its modular architecture comprises five core components—exception classifier, handler pattern registry, structured executor, escalation controller, and AgentOps infrastructure—to enhance debugging and resilience.
- Phase-aware recovery links execution errors to reasoning failures, ensuring trace-driven diagnosis and adaptive re-execution in autonomous LLM-driven workflows.
SHIELDA (Structured Handling of Exceptions in LLM-Driven Agentic Workflows) is a modular runtime framework designed to enable robust, phase-aware exception handling in software systems powered by LLMs that autonomously reason, plan, and execute multi-step workflows. Distinguished by a fine-grained taxonomy of exception types, structured recovery patterns, and a multi-phase composable architecture, SHIELDA provides principled mechanisms to trace and resolve failures throughout both the reasoning and execution lifecycles of agentic workflows.
1. Exception Taxonomy across Agentic Artifacts
Exception management in agentic LLM workflows requires a granular understanding of both where and how errors manifest. SHIELDA is premised on a fine-grained taxonomy characterizing 36 distinct exception types distributed over 12 agent artifacts. Artifacts include Goal, Memory, Reasoning, Planning, Tool, Interface, Task Flow, and External System, among others. Each artifact is associated with exception types classified by phase:
- RP (Reasoning/Planning)
- E (Execution)
- RP/E (cross-phase)
Representative exceptions include "Ambiguous Goal" and "Conflicting Goal" for the Goal artifact (RP phase), "Memory Poisoning" and "Outdated Memory" for Memory (RP/E phase), and "Tool Invocation Exception" for Tool (E phase). The taxonomy is formally structured as in Table 1 of (Zhou et al., 11 Aug 2025), for example:
Artifact | Detailed Exception | Phase |
---|---|---|
Goal | Ambiguous Goal | RP |
Memory | Outdated Memory | RP/E |
Tool | Tool Invocation Exception | E |
This classification provides the foundation for selecting and composing recovery mechanisms appropriate to both the locus (artifact) and timing (phase) of the exception.
2. Modular Architecture and Core Components
The SHIELDA framework is built around five interconnected components:
- Exception Classifier: Monitors ongoing workflow execution, identifies the exception type leveraging the above taxonomy, determines the phase(s) involved, and isolates the relevant agent artifact(s).
- Handler Pattern Registry: Stores a library of handler patterns, each indexed by exception type and consisting of three dimensions: local handling, flow control, and state recovery. Patterns are pre-defined and enable composable, non-ad hoc recovery routines.
- Structured Handling Executor: Orchestrates the response specified by the selected pattern, executing sequentially:
- Local Handling (e.g., prompt clarification, exponential backoff retry, plan repair)
- Flow Control (e.g., continue, skip, abort)
- State Recovery (e.g., rollback, compensation, no-op)
- Escalation Controller: Manages transition to escalation routines when initial recovery fails, including handoff to a peer agent or human intervention, or invoking higher-order fallback routines. This ensures persistent or cross-phase exceptions are not trapped at the point of manifestation but can be traced to their root causes.
- AgentOps Infrastructure: Provides the monitoring, logging, and evaluation infrastructure necessary for post-hoc analysis, live debugging, and auditability. The system maintains comprehensive, structured logs of each classifier and handler decision, along with the resulting state transitions.
A schematic in Figure 1 of (Zhou et al., 11 Aug 2025) visualizes the control and data flow among these components, emphasizing the bidirectional flow between classification and pattern selection, and between initial executor attempts and escalation upon failure.
3. Exception Handling Patterns and Triadic Mechanisms
Central to SHIELDA are the handler patterns, each a triad across local handling, flow control, and state recovery. Handler patterns are specified in a registry and are invoked by mapping exception types (as recognized by the classifier) to their corresponding pattern. For example:
Pattern ID | Local Handling | Flow Control | State Recovery |
---|---|---|---|
P001 | Clarify Prompt | Abort | No-op |
P012 | Plan Repair | Abort | No-op |
P018 | Retry with Backoff | Continue | No-op |
For a "Tool Invocation Exception" (e.g., a transient API failure), pattern P018 is used, instructing an exponential backoff retry (local handling), with "Continue" if successful, and "No-op" for state recovery. Patterns such as "Plan Repair" (P012) are mapped to deeper, cross-phase exceptions like Faulty Task Structuring.
The structured handling executor instantiates these routines, and the pattern registry is extensible, allowing new triads (and their dimensions) to be incorporated as more exception types or domain needs are identified.
4. Phase-Aware, Root Cause-Oriented Recovery
A distinguishing feature of SHIELDA is its phase-aware recovery: exceptions encountered during execution are not treated in isolation. Instead, the exception classifier and escalation controller link them to reasoning (or planning) phase mistakes whenever feasible. Specifically, local handling is attempted first, but upon repetitive or recalcitrant failure, the system escalates:
- Reviewing structured logs and execution traces,
- Mapping the symptomatic (execution-phase) error to its upstream cause (reasoning/planning-phase misstep),
- Invoking corrective routines (e.g., forced clarification prompt or explicit plan repair),
- Cleaning the prior corrupted state (aborting/rolling back) and restarting execution with a compliant, revised plan.
This enables closed-loop diagnose–repair–re-execute cycles, essential for handling the compositional and latent error propagation properties of agentic workflows.
5. Case Study: AutoPR Agent and Empirical Validation
The SHIELDA framework is validated via a thorough case paper on the AutoPR agent, an autonomous system for managing GitHub Pull Request tasks. The test scenario induces a deliberate, cross-phase exception: an issue directs the agent to "add a nonexistent user as a reviewer." Instead of annotating the README, the agent modifies the workflow configuration (violating platform permissions), resulting in an execution time protocol error (e.g., commit push failure).
SHIELDA's sequence of response:
- The execution-phase Protocol Mismatch is detected and undergoes local handling (retry with backoff);
- On repeated failure, the escalation controller traces the issue upstream to a faulty plan generated in the reasoning phase ("Faulty Task Structuring");
- The Plan Repair handler prompts the agent to generate a revised plan disallowing workflow file modifications;
- The executor aborts the faulty thread, restarts with the compliant plan, and ultimately completes the original goal successfully.
This demonstrates the system's cross-phase linkage and its capacity for diagnosis and robust recovery. Structured logs and decision mappings provide trace-based validation of each stage.
6. Contextualization within the Broader Exception Handling Literature
SHIELDA advances the field beyond prior exception handling strategies in agentic workflows in several ways:
- Where prior solutions often treat exceptions as superficial, execution-local failures, SHIELDA incorporates explicit tracing and mapping to reasoning-phase root causes;
- Its recovery logic is modular and composable, with a triadic pattern structure, as opposed to brittle, monolithic routines lacking escalation pathways;
- Rich taxonomic granularity enables nuanced differentiation between exception types and tailored strategy selection, covering both synchronous and asynchronous, expected and unexpected, as suggested in the tradition of resource-driven exception handling (0903.0054).
Furthermore, SHIELDA is informed by methodologies found in agentic workflow optimization (Yuksel et al., 22 Dec 2024), safety evaluation (Chen et al., 13 Feb 2025), compliance frameworks (Zwerdling et al., 22 Jul 2025), error escalation (Dawid et al., 13 Apr 2025), and provenance tracking (Souza et al., 4 Aug 2025), uniting these approaches into a single phase-spanning, agent-aware infrastructure.
7. Implications and Future Directions
SHIELDA's integration of exception taxonomy, composable pattern-based recovery, and phase-aware escalation forms the basis for highly resilient LLM-driven agentic workflows. Its design supports extensibility for new exception modes and artifacts, and, by coupling with robust monitoring, enables auditable and dynamically adaptive agent operations. While current empirical results focus on single-agent cases (e.g., AutoPR), the compositional architecture suggests applicability to multi-agent, multi-phase workflows.
This suggests that the paradigm offered by SHIELDA could serve as an operational foundation for exception handling across a spectrum of LLM agentic systems, where transparent diagnosis, traceability, and robust recovery are essential for autonomous reliability (Zhou et al., 11 Aug 2025).