WebDART: Dynamic Decomposition and Re-planning for Complex Web Tasks (2510.06587v1)

Published 8 Oct 2025 in cs.AI

Abstract: LLM agents are becoming competent at straightforward web tasks, such as opening an item page or submitting a form, but still struggle with objectives that require long horizon navigation, large scale information extraction, and reasoning under constraints. We present WebDART, a general framework that enables a single LLM to handle such complex chores. WebDART (i) dynamically decomposes each objective into three focused subtasks: navigation, information extraction, and execution, so the model concentrates on one skill at a time, and (ii) continuously replans the decomposition as new webpages are revealed, taking advantage of newly discovered filters or shortcuts and avoiding redundant exploration. Evaluated on WebChoreArena, WebDART lifts success rates by up to 13.7 percentage points over previous SOTA agents, while matching their performance on the easier WebArena suite and completing tasks with up to 14.7 fewer navigation steps.

Summary

The paper introduces dynamic decomposition and re-planning to tackle complex web tasks, improving navigation and multi-step reasoning.
It details a modular framework that splits tasks into navigation, information extraction, and execution for better accuracy.
Experimental results show notable success rate improvements over baselines like BrowserGym and AgentOccam on WebChoreArena benchmarks.

WebDART: Dynamic Decomposition and Re-planning for Complex Web Tasks

WebDART introduces an advanced framework designed to facilitate complex web tasks by employing dynamic decomposition and re-planning strategies. Web agents powered by LLMs exhibit competence in straightforward web tasks yet falter in handling long-horizon navigation, large-scale data extraction, and constraint-based reasoning. WebDART effectively addresses these challenges by enabling a single LLM to manage complex web chores through three distinct subtasks: navigation, information extraction, and execution. This structured approach empowers agents to concentrate on one skill set at a time, dynamically adjusting the task decomposition as new pages and elements are encountered.

Introduction

Existing LLM-based web agents display efficiency in simple tasks but struggle with complexities arising from reasoning and multi-step navigation, as demonstrated in WebChoreArena benchmarks (Figure 1).

Figure 1: Existing LLM-based web agents perform well on simple tasks, but their success rates drop on complex tasks that require non-trivial reasoning, such as applying a price-range filter.

Complex tasks typically cause cognitive overload in agents, requiring simultaneous navigation, information extraction, and reasoning under constraints. For instance, in a task demanding the identification of top-reviewed products within a price range, agents must navigate through nested web pages and filter information, a process prone to errors due to overwhelming complexity.

Human task decomposition, in contrast, involves segmenting objectives into manageable steps—such as filtering pages, extracting data, and applying logic to analyze findings. Inspired by this, WebDART incorporates decomposition and re-planning to adapt the agent's strategies dynamically, thereby improving performance and reducing navigation errors.

Methodology

WebDART decomposes complex web tasks into three modular subtasks, adjusting plans as new webpage elements are discovered (Figure 2).

Figure 2: Overview of the WebDART framework. A complex web task is dynamically decomposed into three sequential subtasks.

Task Decomposition

Task decomposition focuses on streamlining objectives into navigation, information extraction, and execution. This approach facilitates modular task management, enabling agents to concentrate on specific subtasks while updating plans based on new observations. WebDART defaults to a conservative scheme in decomposition—deferring constraint handling to execution—thereby reducing navigation complexity and leveraging dynamic replanning to incorporate real-time web page insights.

The navigation module orchestrates page discovery through browser actions until all relevant pages are visited. Employing plan-guided browsing, the agent generates a navigation plan—listing pages, capturing information, and incorporating stopping criteria—and updates this plan in light of helpful interface widgets appearing mid-navigation.

Dynamic replanning, pivotal to navigation efficiency, ensures revised tasks align with newly unveiled web elements (filters, sorting options), thus optimizing page traversal by transitioning from exhaustive navigation to efficient filtering.

Information Extraction

Information extraction involves structured data conversion from web pages. The module applies selective page indexing to isolate task-relevant observations, coupling this with LLM-based field extraction to compile structured JSONL records. The resultant data set is then processed in the execution phase.

Execution

Execution undertakes data analysis and action-oriented objectives. In data analysis, LLM-generated Python code analyzes structured records, employing operations such as filtering, sorting, and aggregation. Reflection loops enhance robustness by allowing adaptive code generation in response to execution errors.

For action-oriented objectives, execution translates analysis into direct web actions—such as submitting forms or posting content—using a simplified navigation strategy based on predetermined element engagement.

Experimentation

WebDART demonstrates substantial improvement in success rates for complex tasks on WebChoreArena, outperforming competitive baselines such as BrowserGym and AgentOccam by substantial margins. The framework's adaptability across different LLM backbone models indicates its robustness, ensuring consistent ruling over complex and simpler navigational tasks alike.

In evaluating dynamic re-planning, results show marked improvements in navigation step efficiency, with substantial accuracy gains noted in task success rates, particularly under domains requiring complex objective handling.

WebDART maintains competitive performance on simpler tasks within the WebArena benchmark, confirming the framework's versatility in adapting between complexity scopes without degrading simpler task performance.

Conclusion

WebDART introduces a practical solution to web agent challenges in complex task settings through decomposition and adaptive re-planning. By focusing on modular task segmentation and real-time adjustment, WebDART elevates both task accuracy and efficiency, achieving superior performance on complex test suites while remaining competitive on simpler benchmarks. Future explorations could aim to extend its principles to multimodal environments, further expanding its application domains and scalability in web automation. The framework's dynamic adaptability positions it as a significant advancement in the evolving landscape of LLM-powered web agents.