TrajAD: Trajectory Anomaly Detection
- TrajAD is a methodology that audits LLM agent execution by assessing each planning-action-observation tuple for process-level anomalies.
- It categorizes anomalies into task failure, inefficiency, and unwarranted continuation, addressing intermediate step errors often missed by traditional safety methods.
- Leveraging a dedicated TrajBench dataset and a LoRA-tuned verifier, TrajAD outperforms generalist LLM baselines in both anomaly detection and error localization.
Trajectory Anomaly Detection (TrajAD) is a methodology and model for auditing LLM agent execution, focusing on identifying and precisely localizing process-level anomalies in agent trajectories. Conventional agent safety mechanisms emphasize input/output filtering, but these approaches neglect the complexities of real-world execution, where failures often materialize as intermediate stepwise errors. TrajAD addresses this gap by framing and operationalizing the trajectory anomaly detection task, constructing a dedicated data benchmark (TrajBench), and introducing a specialized verifier that significantly outperforms generalist LLM baselines in both anomaly detection and error localization (Liu et al., 6 Feb 2026).
1. Formalization of Trajectory Anomaly Detection
An agent's runtime behavior is represented as an ordered sequence of planning, action, and observation tuples: where is the high-level instruction, ("thought") is the agent's reasoning at step , the action executed, and the observed outcome.
Trajectories are classified as:
- Normal: Every step is both logically and operationally valid, culminating in completion or lawful refusal.
- Anomalous: At least one process step violates rationality, subdivided into:
- Type I: Task Failure () – reasoning or execution errors causing abortion or mis-computation.
- Type II: Inefficiency () – redundant or looping actions that do not affect the outcome.
- Type III: Unwarranted Continuation () – refusal avoidance or unwarranted post-completion continuation.
Trajectory anomaly detection is the mapping with:
- 0: binary anomaly verdict,
- 1: index of the first erroneous step, or empty if normal.
Supervision is sequence-to-sequence: the model is trained to generate a structured output 2 (verdict and localization) from serialized trajectory input, minimizing autoregressive cross-entropy: 3 where typically 4 consists of two tokens (class, index), optionally augmented with diagnostic information.
2. Dataset Design: TrajBench
TrajBench is a dataset targeting broad procedural anomaly coverage, constructed via a perturb-and-complete pipeline:
8
Composition and Statistics:
| Metric | Value |
|---|---|
| Total samples | 63,484 (balanced: 31,742 normal / 31,742 anomalous) |
| Anomaly type breakdown | Type I: ~33%, Type II: ~33%, Type III: ~33% |
| Domain/task coverage | 13 tasks across Math, Reasoning, Coding, Web, Embodied AI |
| Generation success rates | Seeds valid: 91.6%, Anomaly synthesis: 92.2% |
| Human-model agreement (500 sampled) | Verdict: 96.2%, Localization: 94.5% |
A key methodological feature is strict balancing and broad coverage to avoid trivial pattern learning and enhance generalization.
3. Model Architecture and Training
The TrajAD verifier is based on Qwen-3-4B, a decoder-only Transformer. Parameter adaptation leverages Low-Rank Adaptation (LoRA, rank 5, 6):
- Frozen backbone weights 7,
- Trainable low-rank adapters 8 for each linear layer; activations become 9.
Inputs: A serialized tuple of system instruction and full trajectory, 0.
Outputs: Structured report 1, where 2 is the verdict, and 3 the localization index or empty.
Supervision: Combined cross-entropy loss for class and localization as described in Section 1.4 above.
Training configuration:
- Optimizer: Paged AdamW (8-bit weights)
- Peak learning rate: 4
- Linear warmup for 10% of schedule
- Batch size maximized for NVIDIA A100 80GB memory utilization
- No explicit curriculum or regularization beyond LoRA
4. Empirical Performance and Comparison
Evaluation emphasizes both detection (binary anomaly classification) and localization. Metrics include:
- Detection:
5
- Localization: Joint Exact Match (JEM):
6
with string similarity threshold 7.
Table: In-Distribution Performance
| Model | Params | Method | P(%) | R(%) | F1(%) | JEM(%) |
|---|---|---|---|---|---|---|
| Gemma-3-4B-Instruct | 4 B | zero-shot | 68.64 | 64.66 | 64.20 | 9.07 |
| Phi-3-Mini | 4 B | zero-shot | 67.78 | 28.46 | 30.65 | 3.28 |
| Qwen3-4B | 4 B | zero-shot | 79.07 | 68.97 | 70.43 | 5.54 |
| Qwen3-8B | 8 B | zero-shot | 76.16 | 69.60 | 67.90 | 5.81 |
| TrajAD (LoRA-finetuned) | 4 B | LoRA-finetune | 82.90 | 82.49 | 81.81 | 53.75 |
Qualitative analysis demonstrates that TrajAD can diagnose subtle process faults; for example, in an embodied AI scenario, TrajAD accurately detects redundant actions post-completion, while generalist LLMs misclassify as normal.
5. Analysis, Generalization, and Limitations
Cross-domain transfer is assessed via leave-one-domain-out experiments:
- Zero-shot Qwen3-4B yields F1 ≈ 70.89%, JEM ≈ 11.48%.
- TrajAD finetuned everywhere but the held-out domain attains F1 ≈ 83.09%, JEM ≈ 38.25%.
- Full supervision recovers up to F1 ≈ 83.84%, JEM ≈ 52.54%.
This suggests core detection logic is robust to domain shift, although error localization is more dependent on in-domain examples.
Ablation studies show:
- Increasing training set size enhances performance up to ~50k samples (F1=85.31%, JEM=61.02%) but saturates thereafter, with minor negative transfer at 60k.
- Model scale increase (Qwen3-8B) does not surpass the LoRA-tuned 4B baseline (F1=78.97%) under full-data supervision.
A plausible implication is that effective process verification depends more on specialized, high-quality supervision than on model size alone.
Limitations:
- Error localization is less reliable under severe domain shift, particularly with novel action/observation vocabularies.
- Very long trajectories (>100 steps) may exceed context capacity.
- Extensions under consideration include multi-type error classification, integration of external state representations, and RL-style rollback orchestration based on verifier outputs.
6. Significance, Applications, and Future Directions
TrajAD establishes that high-fidelity procedural auditing for LLM agents requires specialized verifiers trained with step-level process supervision. The synthesis of the TrajBench dataset facilitates generalizable and discriminative learning across a spectrum of anomaly types and domains. Efficient detection and localization enable advanced execution control (rollback and retry), moving beyond static I/O filtering and towards trustworthy agentic systems. Open trajectories for future work include finer-grained error taxonomy, hybrid text-symbolic process monitoring, and robustification under distributional shift (Liu et al., 6 Feb 2026).