Papers
Topics
Authors
Recent
2000 character limit reached

AgentFold: Proactive Context Folding

Updated 2 November 2025
  • AgentFold is a dynamic LLM-based agent paradigm that proactively manages extended interaction histories through learnable context folding.
  • It employs granular condensation and deep consolidation to balance detail preservation and context compactness, addressing saturation and loss.
  • Empirical results demonstrate that AgentFold achieves superior efficiency and scalability, outperforming larger agents on complex information-seeking tasks.

AgentFold is a LLM-based agent paradigm specifically designed to improve the long-horizon reasoning capabilities of web agents by introducing proactive context management based on principles inspired by human retrospective memory consolidation. It addresses fundamental limitations in prevailing agent methodologies—namely, the context saturation problem of append-only (ReAct-style) reasoning and the catastrophic detail loss endemic to routine, full-history summarization—by enabling explicit, learnable, and retroactive “folding” operations over the agent’s internal state. The AgentFold approach achieves substantial empirical gains in information-seeking and general web-agent tasks, demonstrating competitive or superior performance compared to models that are orders of magnitude larger.

1. Motivation and Context Management Challenges

LLM-based web agents promise strong performance in complex information-seeking tasks. These challenges are most acute in long-horizon domains, where efficient management of complex, extended interaction histories is essential. Traditional agent architectures face a fundamental trade-off:

  • ReAct-based agents maintain a complete, append-only trajectory of all reasoning-action-observation steps. While this ensures no loss of fine-grained details, it results in uncontrolled linear context growth, with critical information becoming increasingly buried in noise as task length increases.
  • Summarization-based agents (e.g., MEM1, MemAgent) periodically compress or summarize the entire trajectory to maintain context length within model limits. This stepwise abstraction, while controlling size, can irreversibly eliminate crucial details. Over repeated summarization, compounded risk results in a high probability of losing information necessary for solving long-range problems.

AgentFold is motivated by the need to resolve this trade-off by endowing agents with the ability to flexibly and proactively curate their working context, achieving both retention of key details and context compactness.

2. AgentFold Paradigm: Proactive and Human-Inspired Folding

AgentFold treats the internal workspace not as a passive context log but as an actively sculpted, dynamic cognitive workspace. The core innovation is the folding operation, explicitly learned and invoked at every step, inspired by the human process of retrospective consolidation—the selective abstraction, condensation, or pruning of memory only when appropriate.

Each folding operation can take one of two forms:

  • Granular condensation: Summarization at high resolution, retaining fine-grained detail by only condensing the most recent step.
  • Deep consolidation: Multi-step abstraction compressing entire context subsequences (e.g., failed search sub-trajectories or completed sub-tasks) into concise blocks, freeing up capacity.

These operations allow the agent to maintain a multi-scale summary of past actions, selectively preserving, abstracting, or discarding information based on the strategic demands of the task and interaction history.

3. Architecture and Algorithms

The workspace context maintained by AgentFold at each step tt is

Ct=(Q,T,St2,It1)C_t = (Q, T, S_{t-2}, I_{t-1})

where:

  • QQ: Invariant user query anchoring the task
  • TT: Action schema (set of available tools)
  • St2S_{t-2}: Multi-scale state summaries, i.e., ordered blocks St=(sx1,y1,sx2,y2,...,sxm,ym)S_t = (s_{x_1, y_1}, s_{x_2, y_2}, ..., s_{x_m, y_m}), each summarizing steps xx through yy and possibly spanning fine-to-coarse granularity
  • It1I_{t-1}: Latest interaction information, comprising full details of the most recent reasoning, action, and observation

The agent’s response at turn tt is

Rt=AgentFold(Ct;θ)(tht,ft,et,at)R_t = \text{AgentFold}(C_t;\theta) \rightarrow (th_t, f_t, e_t, a_t)

where thtth_t is the reasoning/thinking process, ftf_t is the folding directive (explicated as a JSON object), ete_t is an explanation, and ata_t is the next action.

A folding directive of the form

$f_t = \texttt{\{"range": [%%%%13%%%%], "summary": "%%%%14%%%%"\}}$

instructs the agent to fuse all blocks (as well as the latest step) within [k,t1][k, t-1] and replace them with a new summary block sk,t1=σts_{k,t-1} = \sigma_t. This permits retroactive folding at variable scales:

  • k=t1k = t-1 for granular condensation
  • k<t1k < t-1 for deep consolidation

The context is updated by:

  • Splicing out the folded blocks and inserting sk,t1s_{k, t-1}
  • Setting ItI_t for the subsequent step

Mathematically, AgentFold mitigates detail loss risk inherent in repeated summarization. If each summarization induces detail loss with probability pp, over nn steps, full-history summarization yields survival probability 0.9910036.6%0.99^{100} \approx 36.6\% for p=0.01p=0.01. Granular and strategic folding guard against this compounding degradation by allowing key details to survive indefinitely through selective condensation.

4. Comparison with Prior Context Management Approaches

Agent Type Context Growth Abstraction Mechanism Failure Mode
ReAct (Append-only) Linear with steps None (no abstraction) Context saturates, critical signals buried
Summarization-based Agents Constant size Rigid full-history Compounding loss of key details
AgentFold Sub-linear (flexible) Learnable, retroactive

ReAct-based models preserve all history, guaranteeing fidelity but ultimately becoming resource-inefficient and noisy. Summarization-based models avoid unbounded growth but expose themselves to the cumulative risk of erasing essential information, especially as task lengths expand. AgentFold achieves both strong preservation and efficiency by allowing the agent to decide when and how to fold context, balancing detail with abstraction according to the problem’s structure.

5. Empirical Results and Scaling Behavior

AgentFold was implemented using a Qwen3-30B-A3B‐Instruct LLM, trained exclusively via supervised fine-tuning (SFT) without continual pre-training or RL. Its performance was evaluated on several benchmarks:

  • BrowseComp (EN, ZH): Focused on difficult, information-seeking web queries in English and Chinese, respectively
  • WideSearch-en: Measuring broad web search capacity via Item-F1
  • GAIA: General text-only agentic tasks

Selected results are as follows:

Agent BrowseComp BrowseComp-ZH WideSearch GAIA
AgentFold-30B 36.2% 47.3% 62.1% 67.0%
DeepSeek-V3.1-671B 30.0% 49.2% 63.1%
GLM-4.5-355B 26.4% 37.5% 66.0%
OpenAI-o4-mini 28.3% 44.3%
Claude-4-Sonnet 14.7% 22.5% 62.0% 68.3%

AgentFold outperforms significantly larger open-source (DeepSeek-V3.1-671B, GLM-4.5-355B) and proprietary (OpenAI-o4-mini) agents on BrowseComp and WideSearch, with comparable or superior efficiency. After 100 interaction turns, AgentFold’s context size is approximately 7k tokens, remaining below 20k tokens even at 500 turns owing to deep folding, while ReAct-based agents accumulate hundreds of thousands of tokens. Resource-wise, AgentFold's context at turn 100 is approximately 92% smaller, representing a savings of nearly 7GB per inference instance.

Performance continues to scale smoothly as permitted turns increase (tested up to 256 tool calls and 500 turns), with competitors plateauing due to context or memory limits. AgentFold's dynamic folding allows it to compactly summarize unsuccessful or irrelevant search paths, maintaining clarity and adaptability.

Ablation analysis confirms that multi-scale folding is responsible for sub-linear context growth and the ability to scale to hundreds of steps. Removal or simplification of the folding mechanism results in loss of these properties.

6. Effectiveness and Theoretical Justification

AgentFold’s efficacy is based on several key properties:

  • Granular condensation preserves essential details as long as required, avoiding premature abstraction.
  • Deep consolidation and selective abstraction prevent context saturation, allowing for efficient use of attention and memory.
  • Learned, task-adaptive folding policies enable the agent to dynamically select folding actions according to task demands, which is not possible for rigid, rule-based approaches.
  • Retrospective consolidation analogous to human self-regulation permits the agent to manage its internal representation at multiple timescales, improving fidelity and clarity over extended horizons.
  • Resource efficiency and scalability are direct outcomes of the above, enabling long-horizon operation with fixed context size and manageable computation.

Case studies demonstrate AgentFold’s capability to fold away dead ends, dynamically replan, and retain only relevant context, which is not achievable by prior agents lacking retroactive folding.

7. Implications and Future Directions

AgentFold constitutes a new paradigm for long-horizon LLM agents. Its approach to proactive, learnable folding has implications for any agentic task with onerous context management requirements. The architecture is compatible with further advances such as reinforcement learning optimization of folding policies, richer context partitioning schemes, and hierarchical meta-reasoning over context blocks.

A plausible implication is that proactive multi-scale folding could become a standard architectural element in LLM-based agents, particularly as benchmarks and practical tasks continue to expand in scope and horizon length. Furthermore, its effectiveness with standard supervised fine-tuning suggests wide accessibility and adaptability for future research and deployment.

AgentFold represents a substantial advancement over past ReAct-style and naive summarization-based agents, providing the first demonstration that highly efficient, scalable context management is possible even with comparatively modest model sizes.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to AgentFold.