AgentFold: Proactive Context Folding
- AgentFold is a dynamic LLM-based agent paradigm that proactively manages extended interaction histories through learnable context folding.
- It employs granular condensation and deep consolidation to balance detail preservation and context compactness, addressing saturation and loss.
- Empirical results demonstrate that AgentFold achieves superior efficiency and scalability, outperforming larger agents on complex information-seeking tasks.
AgentFold is a LLM-based agent paradigm specifically designed to improve the long-horizon reasoning capabilities of web agents by introducing proactive context management based on principles inspired by human retrospective memory consolidation. It addresses fundamental limitations in prevailing agent methodologies—namely, the context saturation problem of append-only (ReAct-style) reasoning and the catastrophic detail loss endemic to routine, full-history summarization—by enabling explicit, learnable, and retroactive “folding” operations over the agent’s internal state. The AgentFold approach achieves substantial empirical gains in information-seeking and general web-agent tasks, demonstrating competitive or superior performance compared to models that are orders of magnitude larger.
1. Motivation and Context Management Challenges
LLM-based web agents promise strong performance in complex information-seeking tasks. These challenges are most acute in long-horizon domains, where efficient management of complex, extended interaction histories is essential. Traditional agent architectures face a fundamental trade-off:
- ReAct-based agents maintain a complete, append-only trajectory of all reasoning-action-observation steps. While this ensures no loss of fine-grained details, it results in uncontrolled linear context growth, with critical information becoming increasingly buried in noise as task length increases.
- Summarization-based agents (e.g., MEM1, MemAgent) periodically compress or summarize the entire trajectory to maintain context length within model limits. This stepwise abstraction, while controlling size, can irreversibly eliminate crucial details. Over repeated summarization, compounded risk results in a high probability of losing information necessary for solving long-range problems.
AgentFold is motivated by the need to resolve this trade-off by endowing agents with the ability to flexibly and proactively curate their working context, achieving both retention of key details and context compactness.
2. AgentFold Paradigm: Proactive and Human-Inspired Folding
AgentFold treats the internal workspace not as a passive context log but as an actively sculpted, dynamic cognitive workspace. The core innovation is the folding operation, explicitly learned and invoked at every step, inspired by the human process of retrospective consolidation—the selective abstraction, condensation, or pruning of memory only when appropriate.
Each folding operation can take one of two forms:
- Granular condensation: Summarization at high resolution, retaining fine-grained detail by only condensing the most recent step.
- Deep consolidation: Multi-step abstraction compressing entire context subsequences (e.g., failed search sub-trajectories or completed sub-tasks) into concise blocks, freeing up capacity.
These operations allow the agent to maintain a multi-scale summary of past actions, selectively preserving, abstracting, or discarding information based on the strategic demands of the task and interaction history.
3. Architecture and Algorithms
The workspace context maintained by AgentFold at each step is
where:
- : Invariant user query anchoring the task
- : Action schema (set of available tools)
- : Multi-scale state summaries, i.e., ordered blocks , each summarizing steps through and possibly spanning fine-to-coarse granularity
- : Latest interaction information, comprising full details of the most recent reasoning, action, and observation
The agent’s response at turn is
where is the reasoning/thinking process, is the folding directive (explicated as a JSON object), is an explanation, and is the next action.
A folding directive of the form
$f_t = \texttt{\{"range": [%%%%13%%%%], "summary": "%%%%14%%%%"\}}$
instructs the agent to fuse all blocks (as well as the latest step) within and replace them with a new summary block . This permits retroactive folding at variable scales:
- for granular condensation
- for deep consolidation
The context is updated by:
- Splicing out the folded blocks and inserting
- Setting for the subsequent step
Mathematically, AgentFold mitigates detail loss risk inherent in repeated summarization. If each summarization induces detail loss with probability , over steps, full-history summarization yields survival probability for . Granular and strategic folding guard against this compounding degradation by allowing key details to survive indefinitely through selective condensation.
4. Comparison with Prior Context Management Approaches
| Agent Type | Context Growth | Abstraction Mechanism | Failure Mode |
|---|---|---|---|
| ReAct (Append-only) | Linear with steps | None (no abstraction) | Context saturates, critical signals buried |
| Summarization-based Agents | Constant size | Rigid full-history | Compounding loss of key details |
| AgentFold | Sub-linear (flexible) | Learnable, retroactive | – |
ReAct-based models preserve all history, guaranteeing fidelity but ultimately becoming resource-inefficient and noisy. Summarization-based models avoid unbounded growth but expose themselves to the cumulative risk of erasing essential information, especially as task lengths expand. AgentFold achieves both strong preservation and efficiency by allowing the agent to decide when and how to fold context, balancing detail with abstraction according to the problem’s structure.
5. Empirical Results and Scaling Behavior
AgentFold was implemented using a Qwen3-30B-A3B‐Instruct LLM, trained exclusively via supervised fine-tuning (SFT) without continual pre-training or RL. Its performance was evaluated on several benchmarks:
- BrowseComp (EN, ZH): Focused on difficult, information-seeking web queries in English and Chinese, respectively
- WideSearch-en: Measuring broad web search capacity via Item-F1
- GAIA: General text-only agentic tasks
Selected results are as follows:
| Agent | BrowseComp | BrowseComp-ZH | WideSearch | GAIA |
|---|---|---|---|---|
| AgentFold-30B | 36.2% | 47.3% | 62.1% | 67.0% |
| DeepSeek-V3.1-671B | 30.0% | 49.2% | — | 63.1% |
| GLM-4.5-355B | 26.4% | 37.5% | — | 66.0% |
| OpenAI-o4-mini | 28.3% | 44.3% | — | — |
| Claude-4-Sonnet | 14.7% | 22.5% | 62.0% | 68.3% |
AgentFold outperforms significantly larger open-source (DeepSeek-V3.1-671B, GLM-4.5-355B) and proprietary (OpenAI-o4-mini) agents on BrowseComp and WideSearch, with comparable or superior efficiency. After 100 interaction turns, AgentFold’s context size is approximately 7k tokens, remaining below 20k tokens even at 500 turns owing to deep folding, while ReAct-based agents accumulate hundreds of thousands of tokens. Resource-wise, AgentFold's context at turn 100 is approximately 92% smaller, representing a savings of nearly 7GB per inference instance.
Performance continues to scale smoothly as permitted turns increase (tested up to 256 tool calls and 500 turns), with competitors plateauing due to context or memory limits. AgentFold's dynamic folding allows it to compactly summarize unsuccessful or irrelevant search paths, maintaining clarity and adaptability.
Ablation analysis confirms that multi-scale folding is responsible for sub-linear context growth and the ability to scale to hundreds of steps. Removal or simplification of the folding mechanism results in loss of these properties.
6. Effectiveness and Theoretical Justification
AgentFold’s efficacy is based on several key properties:
- Granular condensation preserves essential details as long as required, avoiding premature abstraction.
- Deep consolidation and selective abstraction prevent context saturation, allowing for efficient use of attention and memory.
- Learned, task-adaptive folding policies enable the agent to dynamically select folding actions according to task demands, which is not possible for rigid, rule-based approaches.
- Retrospective consolidation analogous to human self-regulation permits the agent to manage its internal representation at multiple timescales, improving fidelity and clarity over extended horizons.
- Resource efficiency and scalability are direct outcomes of the above, enabling long-horizon operation with fixed context size and manageable computation.
Case studies demonstrate AgentFold’s capability to fold away dead ends, dynamically replan, and retain only relevant context, which is not achievable by prior agents lacking retroactive folding.
7. Implications and Future Directions
AgentFold constitutes a new paradigm for long-horizon LLM agents. Its approach to proactive, learnable folding has implications for any agentic task with onerous context management requirements. The architecture is compatible with further advances such as reinforcement learning optimization of folding policies, richer context partitioning schemes, and hierarchical meta-reasoning over context blocks.
A plausible implication is that proactive multi-scale folding could become a standard architectural element in LLM-based agents, particularly as benchmarks and practical tasks continue to expand in scope and horizon length. Furthermore, its effectiveness with standard supervised fine-tuning suggests wide accessibility and adaptability for future research and deployment.
AgentFold represents a substantial advancement over past ReAct-style and naive summarization-based agents, providing the first demonstration that highly efficient, scalable context management is possible even with comparatively modest model sizes.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free