Consistency of Tool Gains for Web Agents

Determine whether programmatic tools for web agents, including API-based interfaces and LLM‑synthesized procedural functions, provide consistent performance gains across diverse tool sources, backbone models, tool‑use frameworks, and evaluation benchmarks in realistic web environments.

Background

The paper studies tool use in web agents, where tools can be human-developed APIs or LLM‑synthesized functions that abstract sequences of low-level browser actions. Prior work has typically evaluated a single tool source and a narrow set of backbone models, leading to inconsistent findings across studies.

Because of limited scale and varying experimental setups, existing results do not conclusively establish whether tool use consistently improves web-agent performance. This motivates a systematic determination of when tools yield gains across different agents, frameworks, and benchmarks.

References

As a result, several fundamental questions remain unclear: i) whether tools provide consistent gains for web agents, ii) what practical design principles characterize effective tools, and iii) what side effects tool use may introduce.

The Tool Illusion: Rethinking Tool Use in Web Agents  (2604.03465 - Lou et al., 3 Apr 2026) in Abstract