Designing a minimal yet complete browser toolkit and interaction paradigm for IS agents

Develop a simple yet functionally complete browser toolkit, together with an effective interaction paradigm, that enables text-only, ReAct-style large language model information-seeking agents to use browser tools efficiently for web-based evidence acquisition under practical context constraints.

Background

The paper observes that most information-seeking agents rely on search and static URL fetching, which fails to access dynamic web content requiring real browser interaction. It highlights the lack of a widely adopted standard for browser action modeling and the complexity introduced by verbose page content, which can exceed typical context limits, hindering efficient ReAct-style reasoning.

To address these challenges, the authors propose NestBrowse, which introduces a minimal four-action browser toolkit (search, visit, click, fill) and a nested interaction framework separating outer-loop tool-integrated reasoning from inner-loop goal-driven intra-page exploration. The open question concerns the broader design principles for a toolkit and interaction paradigm that remain simple yet complete while enabling efficient tool use by IS agents.

References

Consequently, how to design a simple yet complete browser toolkit, together with an effective interaction paradigm that enables IS agents to use such tools efficiently, remains an open challenge.

Nested Browser-Use Learning for Agentic Information Seeking (2512.23647 - Li et al., 29 Dec 2025) in Section 1, Introduction