AgenTracer: Automated LLM Failure Tracing
- AgenTracer is an automated framework that diagnoses LLM system failures by pinpointing the specific agent and time step responsible for decisive errors.
- It employs counterfactual replay and programmed fault injection to generate detailed annotated failure trajectories, enabling scalable and precise debugging.
- Integration with multi-agent systems like MetaGPT has yielded performance gains up to 14.2%, enhancing system reliability and self-correction.
AgenTracer is an automated framework designed to address the problem of failure attribution in LLM-based agentic systems. These systems, typically composed of multiple autonomous agents (often themselves LLMs), tool chains, and orchestration protocols, are capable of complex task execution but suffer from pronounced fragility. Even a single erroneous action by one agent can propagate, causing a cascade of systemic failures. AgenTracer systematically diagnoses such failures by identifying the specific agent and time step responsible for each decisive error within a multi-agent execution trajectory. Furthermore, by providing actionable feedback, it enables subsequent debugging and rapid self-improvement of agentic systems.
1. Motivation and Objectives
The core challenge in advanced LLM agentic systems is their increased susceptibility to execution failures due to complex agent interactions, tool calls, and orchestration mechanisms. Failure attribution—the task of ascertaining which agent or step induced a systemic failure—remains intractable for standard reasoning LLMs, which routinely achieve sub-10% attribution accuracy in this context (as quantified on benchmarks such as Who&When). AgenTracer was specifically developed to automate failure annotation and root-cause localization, thereby eliminating the need for labor-intensive human debugging.
AgenTracer aims to:
- Systematically annotate failed trajectories in multi-agent systems by identifying the precise agent and step responsible for decisive errors.
- Produce structured, actionable feedback for debugging and enabling self-correcting behavior within agentic architectures.
- Replace human-in-the-loop debugging with automated model-based analysis scalable to verbose and intricate interaction logs.
2. Methodological Foundations
AgenTracer combines two principal methodologies: counterfactual replay and programmed fault injection, which together underpin the creation of a comprehensive, annotated failure attribution dataset labeled TracerTraj.
Counterfactual Replay
For each observed execution trajectory leading to failure, AgenTracer deploys an analyzer agent that examines the sequence step by step. At each time step , the analyzer proposes a minimally corrected action that could plausibly avoid failure. The modified trajectory —where action is replaced by —is then simulated. Should this intervention result in successful task completion, step is marked a candidate decisive error. The earliest such step in the trajectory is designated as the root cause of failure.
Programmed Fault Injection
In addition to mining naturally occurring failures, AgenTracer strengthens its training set by deliberately perturbing successful trajectories through targeted fault injection. A valid agent’s action at a selected time step is altered to induce a new failure. As the true location of the injected perturbation is known by design, these synthetic examples serve as high-precision supervision signals for model training.
The TracerTraj Dataset
By integrating annotated natural failures (via counterfactual replay) and synthetic failures (via fault injection), AgenTracer compiles the TracerTraj dataset. This resource includes over 2,000 trajectory–error pairs annotated with their decisive agent-step pairs, serving as the foundation for subsequent learning and benchmarking of failure attribution models.
3. AgenTracer–8B Model Architecture and Training
The central model within the framework is AgenTracer–8B, a lightweight, domain-specialized failure attribution model refined atop a base LLM (e.g., a Qwen3–8B variant). It is optimized for identifying decisive failures within lengthy, multi-agent execution traces.
Structured Output and Diagnostic Feedback
AgenTracer–8B processes a trajectory as input and emits structured outputs:
- The identifier of the faulty agent.
- The specific time step responsible for the decisive error.
Such outputs directly enable automated feedback loops that can guide off-the-shelf agentic systems toward corrective actions in future runs.
Multi-Granular Reinforcement Learning (RL)
Model training employs online RL using a multi-granular reward function:
- Format Reward: Strictly enforces output to conform to the specified structured schema.
- Agent-Level Reward: Binary objective comparing the predicted agent identifier against ground truth.
- Step-Level Reward: Defined via a Gaussian kernel to finely penalize deviations of the predicted step from the true decisive error.
Formally, the RL loss is given by:
where:
- is the policy ratio for the model prediction,
- is the advantage term derived from the multi-granular reward,
- is a dynamic clipping parameter.
The step-level reward function is:
which encourages the model to predict steps close to the true decisive error .
4. Empirical Performance and Benchmarks
AgenTracer–8B’s diagnostic effectiveness was validated on the Who&When benchmark, which encompasses both handcrafted and automatically generated trajectory-failure instances. The system consistently demonstrates strong agent-level and step-level attribution accuracy:
- Up to 18.18% higher precision in failure localization compared to proprietary LLMs (Gemini–2.5–Pro, Claude–4–Sonnet).
- Superior ability to provide concise, correct diagnoses in both authentic and synthetic multi-agent failure scenarios.
The integration of multi-granular RL is instrumental in achieving this diagnostic acuity, particularly under constraints where ground-truth annotations are sparse or unavailable.
5. Integration and Impact on Agentic AI Systems
When deployed within representative multi-agent frameworks such as MetaGPT and MaAS, AgenTracer–8B delivers structured feedback that translates into tangible improvements:
- Empirical gains in overall system performance ranging from 4.8% to 14.2%, contingent on task and configuration.
- Reliable attribution of errors enables iterative correction and adaptation, directly supporting the development of self-correcting and self-evolving agentic systems.
This actionable feedback mechanism diminishes the reliance on human oversight and facilitates more autonomous, robust collective intelligence among LLM agents.
6. Mathematical Formulation of Failure Attribution
AgenTracer’s logic for selecting the decisive failure is mathematically formalized as follows:
- Candidate Error Set Definition:
where - : execution trajectory, - : agent responsible at step , - : evaluator output (0 for failure, 1 for success), - : trajectory after rectifying action at .
- Decisive Error Selection:
indicating the earliest correctable fault that switches task outcome from failure to success.
7. Significance and Future Outlook
AgenTracer stands as the first automated, LLM-targeted framework for high-fidelity failure attribution in multi-agent agentic systems. By leveraging a principled combination of counterfactual trajectory analysis, synthetic supervision signals from fault injection, and a novel multi-granular RL training regime, it sets a new standard in this domain:
- Substantially outperforms prior state-of-the-art baselines in attributing failures within complex execution traces.
- Demonstrates generalizable improvements across various downstream agentic architectures via actionable feedback.
- Encapsulates a scalable methodology for enabling self-correcting and self-evolving collective intelligence among LLM-based agents.
Such automated failure tracing mechanisms are foundational for advancing the reliability and autonomy of next-generation agentic AI systems.