ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization

Published 16 Sep 2025 in cs.CL | (2509.13313v2)

Abstract: LLM-based web agents demonstrate strong performance on knowledge-intensive tasks but are hindered by context window limitations in paradigms like ReAct. Complex queries involving multiple entities, intertwined relationships, and high uncertainty demand extensive search cycles that rapidly exhaust context budgets before reaching solutions. To overcome this challenge, we introduce ReSum, a novel paradigm that enables indefinite exploration through periodic context summarization. ReSum converts growing interaction histories into compact reasoning states, maintaining awareness of prior discoveries while bypassing context constraints. For paradigm adaptation, we propose ReSum-GRPO, integrating GRPO with segmented trajectory training and advantage broadcasting to familiarize agents with summary-conditioned reasoning. Extensive experiments on web agents across three benchmarks demonstrate that ReSum delivers an average absolute improvement of 4.5% over ReAct, with further gains of 8.2% following ReSum-GRPO training. Notably, with only 1K training samples, our WebResummer-30B (a ReSum-GRPO-trained version of WebSailor-30B) achieves 33.3% Pass@1 on BrowseComp-zh and 18.3% on BrowseComp-en, surpassing most open-source web agents.

Abstract PDF Upgrade to Chat

Authors (16)

First 10 authors:

Summary

The paper introduces ReSum, a novel approach that continuously summarizes interaction histories to overcome context window limitations in LLM-based web search tasks.
It employs a specialized ReSumTool-30B and a ReSum-GRPO algorithm, yielding significant performance improvements over existing ReAct methods in benchmarks.
The evaluation demonstrates that ReSum efficiently manages long-horizon search with minimal architectural changes, paving the way for enhanced agent autonomy.

ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization

This essay provides an authoritative examination of the paper titled "ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization" (2509.13313). ReSum is introduced as a novel paradigm targeting the context limitations inherent in long-horizon search tasks conducted by web agents using LLMs. The primary goal is to enable indefinite exploration by continuously converting interaction histories into compact reasoning states, thus bypassing context constraints. This paradigm is systematically evaluated across benchmarks, demonstrating significant improvements over existing methods such as ReAct.

Introduction and Motivation

LLMs-based agents have become pivotal in executing complex, knowledge-intensive tasks. Their application often involves web agents performing iterative cycles of search, browse, and synthesis. A crucial obstacle encountered is the context window limitation; agents indoctrinated with paradigms like ReAct swiftly exhaust their context budget, leading to premature termination before task completion, as illustrated (Figure 1).

Figure 1: Comparison between ReAct~\citep{yao2023react and ReSum paradigms.

Consider complex queries requiring sophisticated search cycles and extensive evidence collection. ReSum emerges as a solution, periodically summarizing long interaction histories to sustain exploration without exceeding the context limit. This facilitates a seamless shift in focus from raw data accumulation to compressed, actionable states enabling continued reasoning. The technique minimizes modifications to the existing ReAct framework, retaining efficiency and simplicity.

Methodology

ReSum Paradigm

ReSum operates through key phases:

Trajectory Initialization: Start with the user query, building upon each interaction (thought, action, observation) to update the conversation history.
Context Summarization: When summary-trigger conditions—such as approaching token limits—are met, a summary tool compresses the history into a compact state, aiding in proactive context management (Figure 2).
Figure 2: Illustration of ReSum-GRPO.
Trajectory Termination: Despite enabling extensive exploration, practical deployments impose a tool call budget, designating the trajectory as complete or as failure upon limit breaches.

ReSum excels by implementing a structured summary tool tailored to distill key evidence and information gaps from interactions, reinforcing an agent's ability to pursue long-horizon searches efficiently.

Summary Tool Specialization

In ReSum, generic models are integrated as summarizers. Yet, deploying the ReSumTool-30B enriches summarization capabilities through specialized training on pairs extracted from models like Qwen3-30B-A3B. Extensive evaluations affirm ReSumTool-30B's superiority over larger models in terms of summarization quality—a testament to its task-specific enhancements.

ReSum-GRPO Algorithm

To advance agents with summary-conditioned reasoning, ReSum embraces Reinforcement Learning (RL) through the ReSum-GRPO. This approach nurtures self-evolution without altering ingrained skills, contending with trajectory segmentation based on summary interactions and leveraging trajectory-level advantages to revitalize learning signals across segments.

Experimental Evaluation

Extensive experiments highlight that ReSum paradigms outperform ReAct by facilitating extended exploration phases. Training-free implementations showcase considerable performance gains, with ReSumTool-30B outperforming larger models and enabling agents such as WebSailor-30B to approach proprietary models' performance levels across various benchmarks.

Likewise, ReSum-GRPO complements training with refined, paradigm-oriented RL processes, ensuring agents adeptly negotiate complex inquiries with long trajectories—aided by continuous summarization.

Implications and Future Developments

ReSum offers a promising optimism in advancing LLM-agent capabilities, particularly by addressing inherent context limitations with minimal architectural alterations. Future trajectories point towards enhancing agent autonomy in self-summarization and optimizing summary invocation, potentially eliminating reliance on external tools and pre-set rules.

Conclusion

ReSum presents a methodical enhancement over legacy paradigms for web agent exploration, facilitating refined reasoning states and extending inquiry capabilities bound by traditional context windows. Its integration via ReSum-GRPO encourages agents to master summary-driven reasoning, pivot towards sophisticated understanding, and unveils possibilities for further algorithmic and practical innovations in LLM agent designs.

Markdown Report Issue