WebLeaper: Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking (2510.24697v1)

Published 28 Oct 2025 in cs.CL

Abstract: LLM-based agents have emerged as a transformative approach for open-ended problem solving, with information seeking (IS) being a core capability that enables autonomous reasoning and decision-making. While prior research has largely focused on improving retrieval depth, we observe that current IS agents often suffer from low search efficiency, which in turn constrains overall performance. A key factor underlying this inefficiency is the sparsity of target entities in training tasks, which limits opportunities for agents to learn and generalize efficient search behaviors. To address these challenges, we propose WebLeaper, a framework for constructing high-coverage IS tasks and generating efficient solution trajectories. We formulate IS as a tree-structured reasoning problem, enabling a substantially larger set of target entities to be embedded within a constrained context. Leveraging curated Wikipedia tables, we propose three variants for synthesizing IS tasks, Basic, Union, and Reverse-Union, to systematically increase both IS efficiency and efficacy. Finally, we curate training trajectories by retaining only those that are simultaneously accurate and efficient, ensuring that the model is optimized for both correctness and search performance. Extensive experiments on both basic and comprehensive settings, conducted on five IS benchmarks, BrowserComp, GAIA, xbench-DeepSearch, WideSearch, and Seal-0, demonstrate that our method consistently achieves improvements in both effectiveness and efficiency over strong baselines.

Summary

The paper introduces a novel tree-structured reasoning framework that enriches information-seeking tasks with entity-intensive synthetic data.
It develops three task synthesis variants and uses dual metrics (ISR and ISE) to ensure both effective and efficient agent training.
The hybrid reward system with GRPO and extensive ablation studies demonstrate superior performance on multiple challenging benchmarks.

WebLeaper: Enhancing WebAgent Efficiency and Efficacy via Info-Rich Seeking

Introduction

WebLeaper addresses a critical bottleneck in LLM-based information-seeking (IS) agents: the inefficiency of search behaviors, which is attributed to the sparsity of target entities in conventional training tasks. The framework introduces a novel approach to IS task synthesis and trajectory curation, aiming to simultaneously optimize both the completeness and efficiency of information gathering. By formulating IS as a tree-structured reasoning problem and leveraging entity-intensive synthetic data, WebLeaper systematically increases the density and complexity of IS tasks, thereby providing a more robust training signal for agentic reasoning and decision-making.

Tree-Structured Reasoning and Task Synthesis

WebLeaper models IS as a tree-structured reasoning process, enabling the inclusion of a substantially larger set of target entities within a constrained context. The framework introduces three task synthesis variants:

Version-I (Basic): Constructs a simple reasoning tree from a single structured information source (e.g., a Wikipedia table). The root node represents the question entity, the second layer contains key entities (e.g., authors), and the third layer encodes their attributes (e.g., nationality, award year).
Version-II (Union): Increases task complexity by uniting reasoning trees from multiple sources that share common relations in their subtrees. This requires the agent to perform relational joins (e.g., intersection of Nobel and Booker Prize winners) and integrate evidence across heterogeneous sources.
Version-III (Reverse-Union): Further elevates reasoning difficulty by reversing the logical flow. The agent is provided with fuzzed clues (third-layer entities) as the query, necessitating deduction of an anchor entity (second layer) before executing a union-based search for the final answer set.
Figure 1: Overview of WebLeaper's three task synthesis variants, illustrating the progression from single-source reasoning to complex, multi-source, and reverse-logic tasks.

This hierarchical and compositional task design not only increases the number of target entities per instance but also enforces multi-hop, multi-source reasoning, mitigating shortcut exploitation and promoting generalizable search strategies.

Information-Guided Trajectory Construction

WebLeaper introduces a dual-metric filtering mechanism for trajectory curation:

Information-Seeking Rate (ISR): Measures the fraction of required entities successfully retrieved, providing a direct signal for completeness.
Information-Seeking Efficiency (ISE): Quantifies the average number of action steps per discovered entity, incentivizing concise and targeted search behaviors.

Only trajectories that satisfy both high ISR and ISE thresholds are retained for training, ensuring that the agent is exposed to solution paths that are both accurate and efficient. Theoretical analysis demonstrates that the variance of ISE decreases as the number of target entities increases, making it a stable and reliable metric for complex, entity-rich tasks.

Hybrid Reward System and Reinforcement Learning

To address the reward sparsity and evaluation brittleness in RL for entity-intensive tasks, WebLeaper implements a hybrid reward system:

For synthetic, entity-rich tasks, a granular F-score-based reward is computed using soft precision and recall, with semantic similarity scoring at the entity level to accommodate minor variations (e.g., "USA" vs. "United States").
For legacy benchmark data, the original reward functions are preserved.

This hybridization enables fine-grained, scalable, and robust policy optimization. The agent is trained using Group Relative Policy Optimization (GRPO), which standardizes rewards within trajectory groups, further stabilizing learning.

Figure 2: Training curve of the hybrid reward system, demonstrating stable reward improvement during RL with WebLeaper data.

Empirical Results and Analysis

WebLeaper is evaluated on five challenging IS benchmarks: BrowseComp, GAIA, xbench-DeepSearch, WideSearch, and Seal-0. The framework consistently outperforms strong open-source baselines and achieves performance competitive with proprietary agents on most tasks.

Figure 3: WebLeaper's results on comprehensive training settings, showing superior accuracy and success rates across multiple benchmarks.

Ablation studies reveal that:

The Basic variant, while increasing entity density, can induce shortcut learning if not combined with multi-source complexity.
The Union and Reverse-Union variants yield substantial and consistent gains, with Reverse-Union particularly enhancing planning and decision-making capabilities.
Integrating both ISR and ISE in trajectory filtering produces the highest-quality training signals, especially for complex browsing tasks.
Figure 4: Ablation paper on information-guided trajectory construction, highlighting the joint benefits of ISR and ISE filtering.

Efficiency-effectiveness trade-off analysis demonstrates that WebLeaper achieves higher task performance with fewer tool calls compared to baselines, indicating a more targeted and cost-effective search strategy.

Figure 5: WebLeaper vs. WebSailor-V2: joint effectiveness and efficiency comparison, with WebLeaper dominating in both dimensions.

Data Distribution and Task Complexity

The training data synthesized by WebLeaper exhibits a high degree of entity richness, with a significant portion of samples containing over 100 entities. This complexity is essential for robustly measuring and optimizing search efficiency.

Figure 6: Entity count distribution in training data, emphasizing the prevalence of high-entity tasks and their role in stabilizing efficiency metrics.

Implications and Future Directions

WebLeaper's approach demonstrates that increasing the semantic and structural richness of IS tasks, coupled with rigorous trajectory curation and hybrid reward optimization, can substantially enhance both the efficiency and efficacy of web-based LLM agents. The framework's design principles—entity-intensive task synthesis, multi-source reasoning, and efficiency-oriented supervision—are broadly applicable to other domains requiring complex, long-horizon information gathering.

Practically, WebLeaper's methodology can be extended to domains such as scientific literature review, legal research, and enterprise knowledge management, where comprehensive and efficient information acquisition is critical. Theoretically, the work suggests that future advances in agentic reasoning will require not only architectural innovations but also the systematic enrichment of training environments and reward structures.

Conclusion

WebLeaper provides a comprehensive solution to the inefficiency of LLM-based IS agents by introducing entity-rich, tree-structured task synthesis, information-guided trajectory curation, and a hybrid reward system for RL. The framework achieves strong empirical results, with notable improvements in both effectiveness and efficiency across multiple benchmarks. The work establishes a new paradigm for training and evaluating web agents, emphasizing the importance of info-rich seeking and efficient reasoning in the development of next-generation autonomous systems.