DeepAgent: Autonomous Reasoning & Tool Use

Updated 27 October 2025

DeepAgent is a framework that autonomously integrates textual reasoning, dynamic tool discovery and execution, and scalable memory management to handle long-horizon tasks.
It employs an autonomous memory folding mechanism, dividing interaction history into episodic, working, and tool memories, ensuring efficient context preservation.
DeepAgent leverages reinforcement learning via ToolPO for optimized tool use, achieving superior performance on diverse real-world benchmarks.

DeepAgent refers to a class of agent frameworks and models designed to perform autonomous, high-fidelity reasoning, tool discovery, and execution in complex, real-world tasks, often characterized by long-horizon interactions and dynamic environmental contexts. The DeepAgent paradigm is defined by its unification of internal cognitive processes, scalable toolset integration, and dynamic memory management, overcoming the limitations of traditional agents that rely on rigid workflows or static policies. Recent incarnations of DeepAgent architectures emphasize end-to-end autonomy, reinforcement learning for tool-use optimization, memory folding for context compression, and robust performance on general and open-set tool-use benchmarks (Li et al., 24 Oct 2025).

1. Unified Reasoning and Tool Use

DeepAgent frameworks integrate autonomous textual reasoning (by large reasoning models, LRMs) with dynamic tool discovery and execution. Rather than following fixed sequential pipelines, DeepAgent autonomously determines when to search for tools using dense retrieval over tool documentation and when to execute them, all within a global, continually updating reasoning context. This approach enables DeepAgent to navigate complex task environments, adaptively select the optimal set of tools, and interleave internal planning with external action, always maintaining a global perspective on task progress.

A typical DeepAgent operates under the policy:

$a_t \sim \pi_\theta(\cdot \mid s_t, Q, I)$

where $a_t$ is the action at time $t$ , $s_t$ is the agent’s interaction history, $Q$ the query or question, and $I$ the current intention.

2. Autonomous Memory Folding for Long-Horizon Contexts

One of the major challenges in real-world agentic environments is dealing with context length explosion—resulting from long-horizon interaction histories, repeated tool calls, and the necessity to track multiple task states. DeepAgent deploys an autonomous memory folding mechanism, which intelligently compresses the extensive interaction history into three distinct, brain-inspired components:

Episodic Memory $(M_E)$ : Encodes long-term milestones and significant decisions.
Working Memory $(M_W)$ : Buffer for short-term goals, active plans, and immediate obstacles.
Tool Memory $(M_T)$ : Tracks tool usage patterns, including invocation, outcomes, success, and errors.

At logical breakpoints in reasoning, the agent issues a memory fold command, whereby an auxiliary LLM compresses $s_t$ into $(M_E, M_W, M_T)$ via a structured JSON schema:

$(M_E, M_W, M_T) = f_{compress}(s_t; \theta_{aux})$

This approach preserves essential information, maintains token efficiency, and mitigates error accumulation.

3. End-to-End Reinforcement Learning with ToolPO

DeepAgent’s autonomy in tool use is learned through the Tool Policy Optimization (ToolPO) framework—a reinforcement learning scheme optimized for stable and efficient general-purpose tool integration. ToolPO is characterized by:

LLM-Simulated APIs: During training, real-world APIs are replaced by a tool simulator, providing stable gradient signals and cost-effective credit assignment.
Fine-Grained Tool-Call Advantage Attribution: Credits are assigned to specific tool-invocation tokens in the agent’s output. The advantage function for each action token $y_i$ is:

$A(y_i) = A_{succ} + M(y_i) \cdot A_{action}$

where $A_{succ}$ is the global, group-relative advantage (success reward benchmarked against a group), $A_{action}$ is the local, token-level advantage, and $M(y_i)$ is a mask that ensures only tool-related tokens receive extra credit. The resulting objective follows the clipped surrogate structure familiar from Proximal Policy Optimization, advancing stable RL training.

4. Empirical Performance on Benchmarks

DeepAgent is empirically validated on a suite of eight benchmarks, divided into:

General Tool-Use Tasks: ToolBench (16K+ tools), API-Bank, TMDB, Spotify, ToolHop. DeepAgent demonstrates robust performance in both labeled-tool scenarios (tools provided explicitly) and open-set retrieval (agent must autonomously find applicable tools), with superior success rates to prior workflow-based agents such as ReAct, Plan-and-Solve, and CodeAct.
Downstream Applications: ALFWorld (embodied agent tasks), WebShop (multi-step online shopping), GAIA (complex, multi-modal, information-seeking), and HLE (graduate-level reasoning). DeepAgent’s autonomous problem decomposition, tool-use, and memory management allow it to outperform baseline frameworks, particularly in tasks requiring extended planning and continuous adaptation.

5. Technical Implementations and Mathematical Formalisms

Technical implementations emphasize:

Sequential decision-making over long trajectories: $a_t \sim \pi_\theta(\cdot | s_t, Q, I)$ .
Structured compression for episodic, working, and tool memories: $(M_E, M_W, M_T) = f_{compress}(s_t; \theta_{aux})$ .
RL policy objectives incorporating group-relative and action-specific advantages for efficient tool-use learning: $A(y_i) = A_{succ}(\tau_k) + M(y_i) \cdot A_{action}(\tau_k)$ .

This enables DeepAgent to combine high-capacity reasoning with scalable action management, balancing information preservation with computational efficiency.

6. Applications and Broader Implications

DeepAgent’s architecture is suitable for high-stakes, real-world applications where autonomous agents must coordinate multiple APIs, plan over long horizons, and handle unpredictable task environments:

Digital personal assistants orchestrating diverse online services.
Autonomous agents in web shopping, technical support, or research domains.
Embodied entities (robots, navigational systems) needing persistent, context-aware planning and action.

A plausible implication is that DeepAgent architectures—by integrating memory folding and end-to-end RL—effectively address multi-phase dynamic tasks previously unattainable with workflow-centric agents.

7. Future Directions

Research avenues identified include:

Scaling up heterogeneous toolsets and improving autonomous tool discovery.
Refining autonomous memory folding for optimal balance between compression and informativeness.
Enhancing reinforcement learning with richer advantage attribution and dynamic feedback signals.
Extending multimodal interaction support to develop truly general-purpose real-world assistants.

The DeepAgent framework forms the foundation for agents capable of holistic reasoning, dynamic adaptation, and high-level autonomy in complex environments, moving toward the goal of universally capable and scalable agentic intelligence (Li et al., 24 Oct 2025).

PDF Markdown Chat (Pro)

References (1)

DeepAgent: A General Reasoning Agent with Scalable Toolsets (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to DeepAgent.