Youtu-Agent: Modular LLM Agent Framework
- Youtu-Agent is a modular framework designed to automate the generation, configuration, and evolution of LLM agents using a layered architecture.
- It integrates environment, tools, and agent layers through a YAML-based configuration that enables rapid deployment and systematic variant management.
- The framework employs dual automated agent-generation paradigms and hybrid policy optimization to achieve scalable performance and practical reinforcement learning improvements.
Youtu-Agent is a modular framework designed to automate the generation, configuration, and continuous evolution of LLM agents. Its architecture aims to simultaneously address two central obstacles in contemporary LLM agent frameworks: high configuration cost (stemming from manual tool integration and prompt engineering) and static capabilities (limited adaptability without resource-intensive fine-tuning). By providing a layered execution architecture, declarative configuration, dual automated agent-generation paradigms, and a hybrid policy optimization system, Youtu-Agent enables scalable, maintainable deployment and training of high-performing LLM-based agents across diverse environments and tasks (Shi et al., 31 Dec 2025).
1. Layered System Architecture
Youtu-Agent’s core is a three-layer execution stack:
- Environment Layer: Encapsulates execution contexts, including Playwright browser sessions, operating system shells, and E2B Python sandboxes. It provides APIs that expose both state (e.g., DOM HTML, filesystem status) and primitive actions (e.g., click, run, execute code).
- Tools Layer: Consists of atomic wrappers around environment APIs and supplies environment-independent utilities (e.g., math, text processing, date/time) as well as MCP (Model Context Protocol) adapters for external services. Tools are grouped into configurable toolkits, which can be selectively activated.
- Agent Layer: Operates as an LLM-driven planner embodying a perceive–reason–act loop. It incorporates a context manager for pruning and maintaining a concise working memory window, with strategies such as time-based or token-budget sliding windows.
All three layers are parameterized and interconnected via a structured YAML configuration schema, which defines agent instructions, execution context, context management policy, and enabled toolkits.
Layered Architecture Diagram (informal):
1 2 |
[Environment] ↔ [Tools] ↔ [Agent + Context Manager] Configuration (YAML) drives all layers |
2. Structured Configuration and Context Management
Youtu-Agent decouples agent specification and execution through a declarative YAML-based schema. This schema governs all aspects of agent behavior, including:
- Agent identity and task instructions
- Target execution environment and parameters
- Context management strategy (e.g., recency pruning, token count)
- Selection and activation of toolkits
A typical configuration enables rapid agent instantiation and systematic variant management. The context manager ensures that all relevant state and action history fit within the active LLM context window. Strategies can be tuned for time horizon or information density, with sliding window, time-based, or token-based approaches.
Example YAML Configuration:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
agent: name: research_agent instructions: "You are a helpful research assistant..." env: name: e2b config: {} context_manager: name: base config: {} toolkits: search: activated_tools: ["search","web_qa"] python_executor: activated_tools: ["execute_python_code"] |
3. Automated Agent Generation Paradigms
Youtu-Agent provides two complementary mechanisms for constructing ready-to-run agent artifacts from natural language task descriptions: Workflow Mode and Meta-Agent Mode.
3.1 Workflow Mode
Intended for well-specified, routine automation, this deterministic pipeline proceeds through four stages:
- Intent Clarification & Decomposition
- Tool Retrieval & Ad-hoc Tool Synthesis
- Prompt Engineering
- Configuration Assembly
The output is a YAML configuration with generated or retrieved tool code and prompts, optimized for standard agent use cases such as data scraping or file processing.
Algorithmic Pseudocode:
1 2 3 4 5 6 7 8 9 10 11 |
\textbf{Algorithm 1: Workflow\_Generate}(description)
\begin{algorithmic}[1]
\State \mathrm{spec}\leftarrow \mathrm{decompose}(description)
\State \mathcal{T}\leftarrow \mathrm{search\_tools}(\mathrm{spec})
\If{\mathrm{missing}(\mathcal{T})}
\State \mathcal{T}\leftarrow \mathcal{T}\cup \mathrm{synthesize\_tools}(\mathrm{spec})
\EndIf
\State \mathrm{prompt}\leftarrow \mathrm{engineer\_prompt}(\mathrm{spec},\mathcal{T})
\State \mathrm{yaml}\leftarrow \mathrm{assemble}(\mathrm{env},\mathcal{T},\mathrm{prompt})
\State \Return yaml
\end{algorithmic} |
3.2 Meta-Agent Mode
Targeted at underspecified or complex requirements, Meta-Agent Mode employs an “Architect Agent” (LLM-based) orchestrating tool and agent construction through iterative, multi-turn interaction involving these four operations:
search_tool(query): Retrieve tools from the registrycreate_tool(spec): Auto-generate Python tool code and associated testsask_user(question): Clarify ambiguous requirements through dialogcreate_agent_config(): Assemble final YAML configuration
This process interleaves planning, user query resolution, tool assembly, and configuration until a valid, executable agent artifact emerges.
Algorithmic Pseudocode:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
\textbf{Algorithm 2: MetaAgent\_Generate}(description)
\begin{algorithmic}[1]
\State \mathrm{state}\leftarrow \{desc\}
\While{not done}
\State a\leftarrow \mathrm{ArchitectAgent}(state)
\If{a=\texttt{ask\_user}}
\State \mathrm{resp}\leftarrow \mathrm{ask\_user}(a.\mathrm{query})
\State state+=\mathrm{resp}
\ElsIf{a=\texttt{search\_tool}}
\State \mathcal{T}\leftarrow \mathrm{search\_tool}(a.\mathrm{spec})
\State state+=\mathcal{T}
\ElsIf{a=\texttt{create\_tool}}
\State \tau\leftarrow \mathrm{create\_tool}(a.\mathrm{spec})
\State \mathcal{T}+=\{\tau\}
\ElsIf{a=\texttt{create\_agent\_config}}
\State \mathrm{yaml}\leftarrow \mathrm{assemble}(state,\mathcal{T})
\State \Return yaml
\EndIf
\EndWhile
\end{algorithmic} |
Empirically, the pipeline achieves high validity and executability rates: configuration validity is 100% (Workflow) and 98.75% (Meta-Agent); tool executability is 81.25% (Workflow) and 82.50% (Meta-Agent).
4. Hybrid Policy Optimization: Agent Practice and RL
4.1 Agent Practice Module (Training-Free GRPO)
This module implements a group-relative policy optimization (GRPO) paradigm without explicit parameter updates (“zero-gradient” or “textual LoRA”). The process involves:
- Multi-rollout sampling over a set of training samples
- LLM-based evaluation and ranking or pairwise comparison of alternative trajectories ()
- Distillation of “textual advantage,” a semantic delta () describing what worked
- Injection of this distilled feedback into the agent context during subsequent inference
The objective can be formally written as:
where is an evaluator's quality score. encapsulates actionable experience for future agent executions.
4.2 Agent RL Module
A RESTful API integration enables fully scalable RL training with distributed rollout and update. The framework utilizes:
- Ray-based rollout collectors and hierarchical timeouts for concurrency and reliability
- Safeguards: invalid tool calls filtered, off-policy updates reduced, advantage bias corrected
- PPO-style updates:
where and is the estimated advantage.
- Value loss for the critic:
with entropy bonuses to avoid policy collapse.
This hybrid optimization enables both lightweight, rapid improvement (Practice) and large-scale, robust policy evolution (RL).
5. Empirical Evaluation
Youtu-Agent demonstrates state-of-the-art performance across diverse benchmarks:
5.1 Benchmark Results
| Benchmark | Score | Model/Approach |
|---|---|---|
| WebWalkerQA | 71.47% pass@1 | DeepSeek-V3 open model |
| GAIA (text-only) | 72.8% pass@1 | DeepSeek-V3 open model |
5.2 Automated Generation Metrics (AgentGen-80)
| Metric | Workflow | Meta-Agent |
|---|---|---|
| Config Validity (CV) | 100% | 98.75% |
| Tool Executability (TE) | 81.25% | 82.50% |
| Task Completion (TC) | 65.00% | 68.75% |
5.3 Agent Practice (AIME 2024/2025 Mean@32)
- Base ReAct: 80.0% / 67.9%
- TF-GRPO: 82.7% (+2.7 pp) / 73.3% (+5.4 pp)
- Learning cost ≈ \$18 versus \$10k–\$20k for standard RL
5.4 Agent RL (Qwen2.5-7B)
- 40% iteration-time speedup vs. Agent-Lightning v0.2.2
- AIME24/25 math/code: 0.10→0.45 (+35 pp), 0.09→0.31 (+22 pp)
- Search/QA: e.g., TriviaQA +17 pp, PopQA +19 pp, HotpotQA +17 pp
These empirical results indicate both the reliability of automated agent synthesis and the efficacy of hybrid optimization in boosting agent performance efficiently.
6. Formalism and Theoretical Foundations
- Policy Gradient (PPO) Update:
with ratio clipping applied via .
- Group-Relative Advantage (TF-GRPO):
- Architecture Visualization: The entire system is driven by a unified YAML schema, connecting the environment, tools, and agent/context manager layers.
7. Significance and Related Directions
Youtu-Agent’s modular, YAML-driven design and automated agent generation capabilities substantially lower the manual overhead in producing, deploying, and evolving LLM agents. Its dual modes of optimization—Agent Practice and scalable RL—enable both rapid feedback-driven improvement and full-scale end-to-end policy learning, compatible with open-weight models and commodity hardware.
In contrast with prior frameworks focusing on either static workflow simulation or manual tool integration, Youtu-Agent achieves high tool synthesis (>81%), robust task completion, and significantly accelerates RL-based learning. The integration of textual LoRA in in-context learning provides an efficient mechanism for non-gradient agent optimization.
This suggests the continued evolution of agentic frameworks toward automated, adaptive, and context-driven architectures tightly coupled with LLM advances.
Related agent-centric frameworks such as Youtu-GraphRAG extend these principles into complex graph-structured retrieval and reasoning (Dong et al., 27 Aug 2025), while lightweight LLMs tailored for agentic use, such as Youtu-LLM (Lu et al., 31 Dec 2025), further reinforce the upward trajectory of native agentic intelligence within LLM ecosystems.