Reproducible Protocols for Agent Traces and Leakage-Robust Evaluation
Establish reproducible protocols for collecting complete agent interaction traces (including prompts, tool calls, arguments, outputs, and outcomes), filtering them, and performing leakage-robust evaluation to enable comparable training and assessment across tool-using AI agents.
Sponsor
References
Establishing reproducible protocols for trace collection, filtering, and leakage-robust evaluation remains an open research problem.
— AI Agent Systems: Architectures, Applications, and Evaluation
(2601.01743 - Xu, 5 Jan 2026) in Section 7.2 (Long-Term Memory, Context Management, and Continual Improvement)