WebWeaver: Dual-Agent Research Framework
- WebWeaver is a dual-agent framework that combines iterative planning with hierarchical synthesis to generate reliable, well-structured research reports.
- It features a planner that continuously refines research outlines and a writer that selectively retrieves and synthesizes target evidence, reducing context overflow.
- Empirical validations on OEDR benchmarks demonstrate its effectiveness in enhancing citation accuracy, report quality, and mitigating long-context failures.
WebWeaver is a dual-agent framework designed for open-ended deep research (OEDR), in which AI agents must synthesize large volumes of web-scale evidence into reliable, well-structured reports. Addressing fundamental limitations in current research automation — specifically static pipelines and long-context failures — WebWeaver introduces an adaptive, human-centric methodology that interleaves planning, evidence acquisition, and hierarchical synthesis. The system is empirically validated on a range of open-ended deep research benchmarks, establishing new state-of-the-art results in report quality, reliability, and structure (Li et al., 16 Sep 2025).
1. Dual-Agent Architecture
WebWeaver’s architecture comprises two specialized agents: the planner and the writer.
- Planner: Implements an iterative cycle of evidence acquisition and outline optimization. Rather than following a rigid plan fixed before evidence collection, the planner continuously searches for relevant sources, integrating each discovery back into an evolving outline. This results in a dynamic, citation-linked outline that reflects emerging evidence rather than static hypotheses.
- Writer: Executes hierarchical retrieval and synthesis. The writer decomposes the report into manageable sections, retrieving only the necessary evidence for each part from a memory bank. Each section is written exclusively with the evidence that supports its specific content, greatly reducing context overflow and hallucination risk.
Formally, a complete agent trajectory is defined as: where round includes a thought , an action , and an observation .
2. Overcoming Deep Research Bottlenecks
Traditional OEDR systems suffer two primary limitations: decoupled planning and evidence acquisition, and the "one-shot" generation approach that presents all context at once to the model. The result is frequent "loss in the middle" — where critical evidence is dropped from attention — and increased hallucination risk.
WebWeaver addresses these via:
- Interleaved Planning and Acquisition: Rather than separating search and writing, the planner’s loop adaptively acquires new evidence and integrates it into the outline in real time.
- Hierarchical Section-Wise Writing: Only the section-relevant evidence is retrieved from the memory bank, so long-context attention failures are mitigated.
This design ensures the report remains both comprehensive and strictly source-grounded at all levels.
3. Methodological Principles
WebWeaver’s methodology is explicitly aligned with human-centric research conduct:
- Adaptive Planning: The planner alternates between search actions and outline refinement, so new insights immediately influence report structure.
- Focused Synthesis: The writer generates each section with only its supporting evidence, avoiding distraction from unrelated material.
- Memory Bank Management: All retrieved evidence (summaries, quotations, key data) is stored in a dedicated memory bank. For each subsection, targeted retrieval operations supply only what is needed.
- Attentional Pruning: Upon section completion, the system clears extraneous evidence from context, maintaining model attentional fidelity.
4. Empirical Performance Across OEDR Benchmarks
WebWeaver’s dual-agent, iterative design has demonstrated strong empirical performance:
- DeepResearch Bench: Achieves state-of-the-art scores in comprehensiveness, insight, instruction-following, readability, and citation accuracy.
- DeepConsult: Outperforms competitive baselines in both win rates and average scores for actionable consulting reports.
- DeepResearchGym: Excels in metrics of depth, breadth, and support, attributed to systematic context pruning and targeted retrieval. Cross-sectional interference is suppressed due to the modular writing approach.
These findings confirm the necessity of dynamic planning and focused synthesis for reliable open-ended deep research.
5. Technical Implementation
The technical specifications central to WebWeaver include:
- Planner Agent Actions: "Search", "outline optimization", and "terminate" performed in a sequence of (, , ) iterations.
- Writer Mechanisms: For each section, a "retrieve" action joins context with relevant evidence before initiating "write" operations.
- Memory Bank: Stores distilled representations of web-scale evidence. Lookups are citation-driven, maintaining grounding to underlying sources.
- Context Management: By segmenting writing into subsections and exclusively introducing necessary evidence to context, the system avoids overlong context windows that plague LLM inference.
- Optimization Operators: While explicit formulas like and are not foregrounded, they are implicit in agent reasoning for best-evidence selection.
6. Significance in AI-Assisted Deep Research
WebWeaver establishes the importance of dynamic, adaptive research workflows for high-quality output in contexts where static or unstructured generative agents fail. Its validated methodology — dual-agent design with interleaved planning and modular synthesis — sets a precedent for future open-ended research automation. Empirical evidence from major OEDR benchmarks demonstrates that mitigating long-context failures and hallucinations is paramount to robust, reliable, and granular content production.
This framework marks a key advancement toward human-centric, source-grounded research automation, with practical implications for scientific literature synthesis, consulting, and large-scale evidence-based reporting.