TrajAD: Trajectory Anomaly Detection

Updated 2 July 2026

TrajAD is a methodology that audits LLM agent execution by assessing each planning-action-observation tuple for process-level anomalies.
It categorizes anomalies into task failure, inefficiency, and unwarranted continuation, addressing intermediate step errors often missed by traditional safety methods.
Leveraging a dedicated TrajBench dataset and a LoRA-tuned verifier, TrajAD outperforms generalist LLM baselines in both anomaly detection and error localization.

Trajectory Anomaly Detection (TrajAD) is a methodology and model for auditing LLM agent execution, focusing on identifying and precisely localizing process-level anomalies in agent trajectories. Conventional agent safety mechanisms emphasize input/output filtering, but these approaches neglect the complexities of real-world execution, where failures often materialize as intermediate stepwise errors. TrajAD addresses this gap by framing and operationalizing the trajectory anomaly detection task, constructing a dedicated data benchmark (TrajBench), and introducing a specialized verifier that significantly outperforms generalist LLM baselines in both anomaly detection and error localization (Liu et al., 6 Feb 2026).

1. Formalization of Trajectory Anomaly Detection

An agent's runtime behavior is represented as an ordered sequence of planning, action, and observation tuples: $T = \bigl\{\,I,\,(r_1, a_1, o_1),\,(r_2, a_2, o_2),\,\dots,\,(r_n, a_n, o_n)\bigr\}$ where $I$ is the high-level instruction, $r_t$ ("thought") is the agent's reasoning at step $t$ , $a_t$ the action executed, and $o_t$ the observed outcome.

Trajectories are classified as:

Normal: Every step is both logically and operationally valid, culminating in completion or lawful refusal.
Anomalous: At least one process step violates rationality, subdivided into:
- Type I: Task Failure ( $\mathcal{A}_{\tt fail}$ ) – reasoning or execution errors causing abortion or mis-computation.
- Type II: Inefficiency ( $\mathcal{A}_{\tt ineff}$ ) – redundant or looping actions that do not affect the outcome.
- Type III: Unwarranted Continuation ( $\mathcal{A}_{\tt unw}$ ) – refusal avoidance or unwarranted post-completion continuation.

Trajectory anomaly detection is the mapping $f: T \rightarrow (c, l)$ with:

$I$ 0: binary anomaly verdict,
$I$ 1: index of the first erroneous step, or empty if normal.

Supervision is sequence-to-sequence: the model is trained to generate a structured output $I$ 2 (verdict and localization) from serialized trajectory input, minimizing autoregressive cross-entropy: $I$ 3 where typically $I$ 4 consists of two tokens (class, index), optionally augmented with diagnostic information.

2. Dataset Design: TrajBench

TrajBench is a dataset targeting broad procedural anomaly coverage, constructed via a perturb-and-complete pipeline:

$r_t$ 8

Composition and Statistics:

Metric	Value
Total samples	63,484 (balanced: 31,742 normal / 31,742 anomalous)
Anomaly type breakdown	Type I: ~33%, Type II: ~33%, Type III: ~33%
Domain/task coverage	13 tasks across Math, Reasoning, Coding, Web, Embodied AI
Generation success rates	Seeds valid: 91.6%, Anomaly synthesis: 92.2%
Human-model agreement (500 sampled)	Verdict: 96.2%, Localization: 94.5%

A key methodological feature is strict balancing and broad coverage to avoid trivial pattern learning and enhance generalization.

3. Model Architecture and Training

The TrajAD verifier is based on Qwen-3-4B, a decoder-only Transformer. Parameter adaptation leverages Low-Rank Adaptation (LoRA, rank $I$ 5, $I$ 6):

Frozen backbone weights $I$ 7,
Trainable low-rank adapters $I$ 8 for each linear layer; activations become $I$ 9.

Inputs: A serialized tuple of system instruction and full trajectory, $r_t$ 0.

Outputs: Structured report $r_t$ 1, where $r_t$ 2 is the verdict, and $r_t$ 3 the localization index or empty.

Supervision: Combined cross-entropy loss for class and localization as described in Section 1.4 above.

Training configuration:

Optimizer: Paged AdamW (8-bit weights)
Peak learning rate: $r_t$ 4
Linear warmup for 10% of schedule
Batch size maximized for NVIDIA A100 80GB memory utilization
No explicit curriculum or regularization beyond LoRA

4. Empirical Performance and Comparison

Evaluation emphasizes both detection (binary anomaly classification) and localization. Metrics include:

Detection:

$r_t$ 5

Localization: Joint Exact Match (JEM):

$r_t$ 6

with string similarity threshold $r_t$ 7.

Table: In-Distribution Performance

Model	Params	Method	P(%)	R(%)	F1(%)	JEM(%)
Gemma-3-4B-Instruct	4 B	zero-shot	68.64	64.66	64.20	9.07
Phi-3-Mini	4 B	zero-shot	67.78	28.46	30.65	3.28
Qwen3-4B	4 B	zero-shot	79.07	68.97	70.43	5.54
Qwen3-8B	8 B	zero-shot	76.16	69.60	67.90	5.81
TrajAD (LoRA-finetuned)	4 B	LoRA-finetune	82.90	82.49	81.81	53.75

Qualitative analysis demonstrates that TrajAD can diagnose subtle process faults; for example, in an embodied AI scenario, TrajAD accurately detects redundant actions post-completion, while generalist LLMs misclassify as normal.

5. Analysis, Generalization, and Limitations

Cross-domain transfer is assessed via leave-one-domain-out experiments:

Zero-shot Qwen3-4B yields F1 ≈ 70.89%, JEM ≈ 11.48%.
TrajAD finetuned everywhere but the held-out domain attains F1 ≈ 83.09%, JEM ≈ 38.25%.
Full supervision recovers up to F1 ≈ 83.84%, JEM ≈ 52.54%.

This suggests core detection logic is robust to domain shift, although error localization is more dependent on in-domain examples.

Ablation studies show:

Increasing training set size enhances performance up to ~50k samples (F1=85.31%, JEM=61.02%) but saturates thereafter, with minor negative transfer at 60k.
Model scale increase (Qwen3-8B) does not surpass the LoRA-tuned 4B baseline (F1=78.97%) under full-data supervision.

A plausible implication is that effective process verification depends more on specialized, high-quality supervision than on model size alone.

Limitations:

Error localization is less reliable under severe domain shift, particularly with novel action/observation vocabularies.
Very long trajectories (>100 steps) may exceed context capacity.
Extensions under consideration include multi-type error classification, integration of external state representations, and RL-style rollback orchestration based on verifier outputs.

6. Significance, Applications, and Future Directions

TrajAD establishes that high-fidelity procedural auditing for LLM agents requires specialized verifiers trained with step-level process supervision. The synthesis of the TrajBench dataset facilitates generalizable and discriminative learning across a spectrum of anomaly types and domains. Efficient detection and localization enable advanced execution control (rollback and retry), moving beyond static I/O filtering and towards trustworthy agentic systems. Open trajectories for future work include finer-grained error taxonomy, hybrid text-symbolic process monitoring, and robustification under distributional shift (Liu et al., 6 Feb 2026).

Markdown Report Issue Upgrade to Chat

References (1)

TrajAD: Trajectory Anomaly Detection for Trustworthy LLM Agents (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TrajAD.