Generalization of Signal-Based Sampling to Real Users and Broader Domains

Determine whether the advantages of the lightweight, deterministic signal-based trajectory triage and sampling framework over random and heuristic sampling observed on τ-bench generalize to a broader range of application domains and to interactions with real user populations rather than LLM-simulated users.

Background

The paper proposes a lightweight, signal-based framework that computes deterministic interaction and execution signals from agent trajectories to triage and prioritize trajectories for human review without model calls.

Empirical results on τ-bench (airline and retail domains with LLM-simulated users) show higher developer-informativeness rates and annotation efficiency compared to random and heuristic baselines, but the authors note that whether these advantages carry over to other domains and real user populations is unresolved.

References

While these domains exercise all signal categories in the taxonomy, whether the observed advantages generalize to a broader range of domains and to real user populations remains an open question.

Signals: Trajectory Sampling and Triage for Agentic Interactions  (2604.00356 - Chen et al., 1 Apr 2026) in Limitations (Section 5)