Papers
Topics
Authors
Recent
Search
2000 character limit reached

TrajAD: Trajectory Anomaly Detection

Updated 2 July 2026
  • TrajAD is a methodology that audits LLM agent execution by assessing each planning-action-observation tuple for process-level anomalies.
  • It categorizes anomalies into task failure, inefficiency, and unwarranted continuation, addressing intermediate step errors often missed by traditional safety methods.
  • Leveraging a dedicated TrajBench dataset and a LoRA-tuned verifier, TrajAD outperforms generalist LLM baselines in both anomaly detection and error localization.

Trajectory Anomaly Detection (TrajAD) is a methodology and model for auditing LLM agent execution, focusing on identifying and precisely localizing process-level anomalies in agent trajectories. Conventional agent safety mechanisms emphasize input/output filtering, but these approaches neglect the complexities of real-world execution, where failures often materialize as intermediate stepwise errors. TrajAD addresses this gap by framing and operationalizing the trajectory anomaly detection task, constructing a dedicated data benchmark (TrajBench), and introducing a specialized verifier that significantly outperforms generalist LLM baselines in both anomaly detection and error localization (Liu et al., 6 Feb 2026).

1. Formalization of Trajectory Anomaly Detection

An agent's runtime behavior is represented as an ordered sequence of planning, action, and observation tuples: T={ I, (r1,a1,o1), (r2,a2,o2), …, (rn,an,on)}T = \bigl\{\,I,\,(r_1, a_1, o_1),\,(r_2, a_2, o_2),\,\dots,\,(r_n, a_n, o_n)\bigr\} where II is the high-level instruction, rtr_t ("thought") is the agent's reasoning at step tt, ata_t the action executed, and oto_t the observed outcome.

Trajectories are classified as:

  • Normal: Every step is both logically and operationally valid, culminating in completion or lawful refusal.
  • Anomalous: At least one process step violates rationality, subdivided into:
    • Type I: Task Failure (Afail\mathcal{A}_{\tt fail}) – reasoning or execution errors causing abortion or mis-computation.
    • Type II: Inefficiency (Aineff\mathcal{A}_{\tt ineff}) – redundant or looping actions that do not affect the outcome.
    • Type III: Unwarranted Continuation (Aunw\mathcal{A}_{\tt unw}) – refusal avoidance or unwarranted post-completion continuation.

Trajectory anomaly detection is the mapping f:T→(c,l)f: T \rightarrow (c, l) with:

  • II0: binary anomaly verdict,
  • II1: index of the first erroneous step, or empty if normal.

Supervision is sequence-to-sequence: the model is trained to generate a structured output II2 (verdict and localization) from serialized trajectory input, minimizing autoregressive cross-entropy: II3 where typically II4 consists of two tokens (class, index), optionally augmented with diagnostic information.

2. Dataset Design: TrajBench

TrajBench is a dataset targeting broad procedural anomaly coverage, constructed via a perturb-and-complete pipeline:

rtr_t8

Composition and Statistics:

Metric Value
Total samples 63,484 (balanced: 31,742 normal / 31,742 anomalous)
Anomaly type breakdown Type I: ~33%, Type II: ~33%, Type III: ~33%
Domain/task coverage 13 tasks across Math, Reasoning, Coding, Web, Embodied AI
Generation success rates Seeds valid: 91.6%, Anomaly synthesis: 92.2%
Human-model agreement (500 sampled) Verdict: 96.2%, Localization: 94.5%

A key methodological feature is strict balancing and broad coverage to avoid trivial pattern learning and enhance generalization.

3. Model Architecture and Training

The TrajAD verifier is based on Qwen-3-4B, a decoder-only Transformer. Parameter adaptation leverages Low-Rank Adaptation (LoRA, rank II5, II6):

  • Frozen backbone weights II7,
  • Trainable low-rank adapters II8 for each linear layer; activations become II9.

Inputs: A serialized tuple of system instruction and full trajectory, rtr_t0.

Outputs: Structured report rtr_t1, where rtr_t2 is the verdict, and rtr_t3 the localization index or empty.

Supervision: Combined cross-entropy loss for class and localization as described in Section 1.4 above.

Training configuration:

  • Optimizer: Paged AdamW (8-bit weights)
  • Peak learning rate: rtr_t4
  • Linear warmup for 10% of schedule
  • Batch size maximized for NVIDIA A100 80GB memory utilization
  • No explicit curriculum or regularization beyond LoRA

4. Empirical Performance and Comparison

Evaluation emphasizes both detection (binary anomaly classification) and localization. Metrics include:

  • Detection:

rtr_t5

  • Localization: Joint Exact Match (JEM):

rtr_t6

with string similarity threshold rtr_t7.

Table: In-Distribution Performance

Model Params Method P(%) R(%) F1(%) JEM(%)
Gemma-3-4B-Instruct 4 B zero-shot 68.64 64.66 64.20 9.07
Phi-3-Mini 4 B zero-shot 67.78 28.46 30.65 3.28
Qwen3-4B 4 B zero-shot 79.07 68.97 70.43 5.54
Qwen3-8B 8 B zero-shot 76.16 69.60 67.90 5.81
TrajAD (LoRA-finetuned) 4 B LoRA-finetune 82.90 82.49 81.81 53.75

Qualitative analysis demonstrates that TrajAD can diagnose subtle process faults; for example, in an embodied AI scenario, TrajAD accurately detects redundant actions post-completion, while generalist LLMs misclassify as normal.

5. Analysis, Generalization, and Limitations

Cross-domain transfer is assessed via leave-one-domain-out experiments:

  • Zero-shot Qwen3-4B yields F1 ≈ 70.89%, JEM ≈ 11.48%.
  • TrajAD finetuned everywhere but the held-out domain attains F1 ≈ 83.09%, JEM ≈ 38.25%.
  • Full supervision recovers up to F1 ≈ 83.84%, JEM ≈ 52.54%.

This suggests core detection logic is robust to domain shift, although error localization is more dependent on in-domain examples.

Ablation studies show:

  • Increasing training set size enhances performance up to ~50k samples (F1=85.31%, JEM=61.02%) but saturates thereafter, with minor negative transfer at 60k.
  • Model scale increase (Qwen3-8B) does not surpass the LoRA-tuned 4B baseline (F1=78.97%) under full-data supervision.

A plausible implication is that effective process verification depends more on specialized, high-quality supervision than on model size alone.

Limitations:

  • Error localization is less reliable under severe domain shift, particularly with novel action/observation vocabularies.
  • Very long trajectories (>100 steps) may exceed context capacity.
  • Extensions under consideration include multi-type error classification, integration of external state representations, and RL-style rollback orchestration based on verifier outputs.

6. Significance, Applications, and Future Directions

TrajAD establishes that high-fidelity procedural auditing for LLM agents requires specialized verifiers trained with step-level process supervision. The synthesis of the TrajBench dataset facilitates generalizable and discriminative learning across a spectrum of anomaly types and domains. Efficient detection and localization enable advanced execution control (rollback and retry), moving beyond static I/O filtering and towards trustworthy agentic systems. Open trajectories for future work include finer-grained error taxonomy, hybrid text-symbolic process monitoring, and robustification under distributional shift (Liu et al., 6 Feb 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TrajAD.