Signals: Trajectory Sampling and Triage for Agentic Interactions

Published 1 Apr 2026 in cs.AI and cs.CL | (2604.00356v1)

Abstract: Agentic applications based on LLMs increasingly rely on multi-step interaction loops involving planning, action execution, and environment feedback. While such systems are now deployed at scale, improving them post-deployment remains challenging. Agent trajectories are voluminous and non-deterministic, and reviewing each one, whether through human review or auxiliary LLMs, is slow and cost-prohibitive. We propose a lightweight, signal-based framework for triaging agentic interaction trajectories. Our approach computes cheap, broadly applicable signals from live interactions and attaches them as structured attributes for trajectory triage, identifying interactions likely to be informative without affecting online agent behavior. We organize signals into a coarse-grained taxonomy spanning interaction (misalignment, stagnation, disengagement, satisfaction), execution (failure, loop), and environment (exhaustion), designed for computation without model calls. In a controlled annotation study on $τ$-bench, a widely used benchmark for tool-augmented agent evaluation, we show that signal-based sampling achieves an 82\% informativeness rate compared to 74\% for heuristic filtering and 54\% for random sampling, with a 1.52x efficiency gain per informative trajectory. The advantage is robust across reward strata and task domains, confirming that signals provide genuine per-trajectory informativeness gains rather than merely oversampling obvious failures. These results show that lightweight signals can serve as practical sampling infrastructure for agentic systems, and suggest a path toward preference data construction and post-deployment optimization.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces a computationally lightweight, signal-based triage framework that efficiently identifies developer-informative trajectories for agentic systems.
It partitions signals into interaction, execution, and environment types using pattern matching and deterministic parsing to capture key behavioral patterns.
Results show an 82% informativeness rate with robust performance across domains, enhancing feedback-driven optimization in deployed agentic systems.

Signal-Based Trajectory Triage for Agentic Interactions

Introduction

The deployment of agentic applications powered by LLMs and tool augmentation has foregrounded the necessity for scalable post-hoc optimization capabilities. As these systems generate vast and heterogeneous interaction trajectories in dynamic environments, conventional improvement methods—manual inspection or exhaustive quality annotation—are inefficient and unscalable. This paper introduces a signal-based triage framework, designed to computationally identify developer-informative trajectories for agentic systems efficiently, thereby bridging the gap between voluminous behavioral data and the preference-learning pipelines crucial for aligned agent behavior.

Signal Taxonomy and Detection Architecture

The core contribution is a coarse-grained, computationally lightweight taxonomy of signals designed to triage agentic system trajectories without reliance on model calls. The taxonomy is partitioned along two axes: data modality (discourse between user-agent vs. execution events) and utility (preference learning vs. system diagnosis).

Interaction Signals are derived from user–assistant exchanges and capture discourse-level dynamics—misalignment (semantic or intent divergence events), stagnation (non-advancing dialogues), disengagement (withdrawal or explicit negative user stance), and satisfaction (explicit completion markers). These are detected via normalization and typo-tolerant pattern matching, relying on interpretable phrase-level lexica and turn-level similarity.

Execution Signals are extracted from the agent’s action and tool invocation logs, recording failure events (no-op actions, inappropriate tool use) and loop patterns (oscillatory, repetitive, or parameter-drifting execution flows). These use deterministic parsing of invocation outcome structures and sequence-based heuristics.

Environment Signals encode infrastructure-level or exogeneous failures (e.g., service outages, quota exhaustion), distinguished from execution signals both by input modality (system feedback) and exclusion from direct preference learning due to their non-agentic origin.

This separation supports downstream composition of signal-based triage that targets learning-relevant or diagnostic utility based on application requirements.

Sampling Framework and Annotation Study

Sampling strategies are compared on $\tau$ -bench, which provides a diverse set of tool-augmented, simulated user-agent dialogues with labeled task success/failure. Three paradigms are contrasted: random sampling (unbiased), heuristic sampling (length-based filtering of conversations with over ten user messages), and the proposed signal-filtered sampling (selecting trajectories activating at least one interaction or execution signal, excluding environment-only signals).

Primary evaluation is informativeness rate—the fraction of selected trajectories providing actionable improvement evidence by developer consensus. Three expert annotators, blinded to sampling strategy, label trajectories for informativeness and failure/success category.

Results

Signal-based sampling yields an 82% informativeness rate (95% CI [.73, .89]), outperforming random (54%) and heuristic (74%) baselines on an equal annotation budget and achieving a 1.52× efficiency improvement over random sampling. Statistical analysis shows the improvement is robust: in reward-stratified analysis, signal sampling identifies actionable insights in 66.7% of successful trajectories, compared to 50% (heuristic) and 41.3% (random). Among failed trajectories, signal detection reaches 96.2% informativeness, indicating near-complete recall of developer-relevant failure cases.

Importantly, the compositional analysis reveals signal-based triage does not merely oversample failures, but captures a more balanced distribution, surfacing subtle or policy-violating issues present in nominally successful trajectories. After reweighting to the base reward distribution, signal sampling’s gain persists (77.6% standardized rate vs. heuristic’s 62.7%). The distribution of annotated failure/success categories remains stable across sampling methods, affirming the absence of bias in surfaced issue types.

Domain robustness holds across the $\tau$ -bench’s domains (airline, retail), with signal-based sampling outperforming baselines particularly in the more challenging, heterogeneous retail setting.

Theoretical and Practical Implications

This framework formalizes a scalable, model-free mechanism for bootstrapping supervised preference data from raw interaction logs, which is essential for feedback-driven optimization (RLHF, DPO) in deployed agentic systems. By detaching detection from resource-intensive model inference, the method offers always-on triage capabilities for real-world, high-throughput deployments, circumventing the cost barriers associated with LLM-as-a-judge or model-based reward annotation [ouyang2022training, rafailov2023direct, zheng2023judging].

The framework’s focus on observable behavioral patterns (rather than surface-level conversation length or semantic correctness) provides a robust operationalization of informativeness for downstream RL pipelines, supporting both failure-driven correction and exemplar-driven policy enhancement. The results indicate strong potential for integrating lightweight signal detectors with counterfactual preference construction, serving as the first stage in closed-loop agent optimization at scale.

Limitations and Future Directions

Some limitations remain. Experiments use simulated users in $\tau$ -bench; thus, real-world phenomena like organic disengagement and user frustration may be underrepresented. The taxonomy is intentionally behavioral—factually incorrect yet fluent trajectories are outside detection scope, and leveraging hybrid rule-based/model-based detectors may improve detection coverage, especially for nuanced patterns. Extension to richer domains, real traffic, and integration into end-to-end RLHF pipelines are natural next steps.

Conclusion

This work articulates and validates a model-free, signal-based triage framework for identifying developer-informative trajectories in deployed agentic interaction logs (2604.00356). It demonstrates significant efficiency and coverage gains over standard baselines, robustly stratified by outcome and domain, and establishes a practical precursor stage for preference data curation and post-deployment agent optimization. The approach is characterized by its computational frugality, extensibility, and direct compatibility with reinforcement- and preference-based learning paradigms central to iterative agent improvement.

Markdown Report Issue