Papers
Topics
Authors
Recent
2000 character limit reached

Youtu-Agent: Modular LLM Agent Framework

Updated 7 January 2026
  • Youtu-Agent is a modular framework designed to automate the generation, configuration, and evolution of LLM agents using a layered architecture.
  • It integrates environment, tools, and agent layers through a YAML-based configuration that enables rapid deployment and systematic variant management.
  • The framework employs dual automated agent-generation paradigms and hybrid policy optimization to achieve scalable performance and practical reinforcement learning improvements.

Youtu-Agent is a modular framework designed to automate the generation, configuration, and continuous evolution of LLM agents. Its architecture aims to simultaneously address two central obstacles in contemporary LLM agent frameworks: high configuration cost (stemming from manual tool integration and prompt engineering) and static capabilities (limited adaptability without resource-intensive fine-tuning). By providing a layered execution architecture, declarative configuration, dual automated agent-generation paradigms, and a hybrid policy optimization system, Youtu-Agent enables scalable, maintainable deployment and training of high-performing LLM-based agents across diverse environments and tasks (Shi et al., 31 Dec 2025).

1. Layered System Architecture

Youtu-Agent’s core is a three-layer execution stack:

  • Environment Layer: Encapsulates execution contexts, including Playwright browser sessions, operating system shells, and E2B Python sandboxes. It provides APIs that expose both state (e.g., DOM HTML, filesystem status) and primitive actions (e.g., click, run, execute code).
  • Tools Layer: Consists of atomic wrappers around environment APIs and supplies environment-independent utilities (e.g., math, text processing, date/time) as well as MCP (Model Context Protocol) adapters for external services. Tools are grouped into configurable toolkits, which can be selectively activated.
  • Agent Layer: Operates as an LLM-driven planner embodying a perceive–reason–act loop. It incorporates a context manager for pruning and maintaining a concise working memory window, with strategies such as time-based or token-budget sliding windows.

All three layers are parameterized and interconnected via a structured YAML configuration schema, which defines agent instructions, execution context, context management policy, and enabled toolkits.

Layered Architecture Diagram (informal):

1
2
[Environment] ↔ [Tools] ↔ [Agent + Context Manager]
Configuration (YAML) drives all layers

2. Structured Configuration and Context Management

Youtu-Agent decouples agent specification and execution through a declarative YAML-based schema. This schema governs all aspects of agent behavior, including:

  • Agent identity and task instructions
  • Target execution environment and parameters
  • Context management strategy (e.g., recency pruning, token count)
  • Selection and activation of toolkits

A typical configuration enables rapid agent instantiation and systematic variant management. The context manager ensures that all relevant state and action history fit within the active LLM context window. Strategies can be tuned for time horizon or information density, with sliding window, time-based, or token-based approaches.

Example YAML Configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
agent:
  name: research_agent
  instructions: "You are a helpful research assistant..."
env:
  name: e2b
  config: {}
context_manager:
  name: base
  config: {}
toolkits:
  search:
    activated_tools: ["search","web_qa"]
  python_executor:
    activated_tools: ["execute_python_code"]

3. Automated Agent Generation Paradigms

Youtu-Agent provides two complementary mechanisms for constructing ready-to-run agent artifacts from natural language task descriptions: Workflow Mode and Meta-Agent Mode.

3.1 Workflow Mode

Intended for well-specified, routine automation, this deterministic pipeline proceeds through four stages:

  1. Intent Clarification & Decomposition
  2. Tool Retrieval & Ad-hoc Tool Synthesis
  3. Prompt Engineering
  4. Configuration Assembly

The output is a YAML configuration with generated or retrieved tool code and prompts, optimized for standard agent use cases such as data scraping or file processing.

Algorithmic Pseudocode:

1
2
3
4
5
6
7
8
9
10
11
\textbf{Algorithm 1: Workflow\_Generate}(description)
\begin{algorithmic}[1]
  \State \mathrm{spec}\leftarrow \mathrm{decompose}(description)
  \State \mathcal{T}\leftarrow \mathrm{search\_tools}(\mathrm{spec})
  \If{\mathrm{missing}(\mathcal{T})}
     \State \mathcal{T}\leftarrow \mathcal{T}\cup \mathrm{synthesize\_tools}(\mathrm{spec})
  \EndIf
  \State \mathrm{prompt}\leftarrow \mathrm{engineer\_prompt}(\mathrm{spec},\mathcal{T})
  \State \mathrm{yaml}\leftarrow \mathrm{assemble}(\mathrm{env},\mathcal{T},\mathrm{prompt})
  \State \Return yaml
\end{algorithmic}

3.2 Meta-Agent Mode

Targeted at underspecified or complex requirements, Meta-Agent Mode employs an “Architect Agent” (LLM-based) orchestrating tool and agent construction through iterative, multi-turn interaction involving these four operations:

  • search_tool(query): Retrieve tools from the registry
  • create_tool(spec): Auto-generate Python tool code and associated tests
  • ask_user(question): Clarify ambiguous requirements through dialog
  • create_agent_config(): Assemble final YAML configuration

This process interleaves planning, user query resolution, tool assembly, and configuration until a valid, executable agent artifact emerges.

Algorithmic Pseudocode:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
\textbf{Algorithm 2: MetaAgent\_Generate}(description)
\begin{algorithmic}[1]
  \State \mathrm{state}\leftarrow \{desc\}
  \While{not done}
    \State a\leftarrow \mathrm{ArchitectAgent}(state)
    \If{a=\texttt{ask\_user}}
      \State \mathrm{resp}\leftarrow \mathrm{ask\_user}(a.\mathrm{query})
      \State state+=\mathrm{resp}
    \ElsIf{a=\texttt{search\_tool}}
      \State \mathcal{T}\leftarrow \mathrm{search\_tool}(a.\mathrm{spec})
      \State state+=\mathcal{T}
    \ElsIf{a=\texttt{create\_tool}}
      \State \tau\leftarrow \mathrm{create\_tool}(a.\mathrm{spec})
      \State \mathcal{T}+=\{\tau\}
    \ElsIf{a=\texttt{create\_agent\_config}}
      \State \mathrm{yaml}\leftarrow \mathrm{assemble}(state,\mathcal{T})
      \State \Return yaml
    \EndIf
  \EndWhile
\end{algorithmic}

Empirically, the pipeline achieves high validity and executability rates: configuration validity is 100% (Workflow) and 98.75% (Meta-Agent); tool executability is 81.25% (Workflow) and 82.50% (Meta-Agent).

4. Hybrid Policy Optimization: Agent Practice and RL

4.1 Agent Practice Module (Training-Free GRPO)

This module implements a group-relative policy optimization (GRPO) paradigm without explicit parameter updates (“zero-gradient” or “textual LoRA”). The process involves:

  • Multi-rollout sampling over a set of training samples
  • LLM-based evaluation and ranking or pairwise comparison of alternative trajectories (τi\tau_i)
  • Distillation of “textual advantage,” a semantic delta (Δtext\Delta_\mathrm{text}) describing what worked
  • Injection of this distilled feedback into the agent context during subsequent inference

The objective can be formally written as:

Ai=Q(τi)1Kj=1KQ(τj),A_i = Q(\tau_i) - \frac{1}{K}\sum_{j=1}^K Q(\tau_j),

where Q(τ)Q(\tau) is an evaluator's quality score. Δtext\Delta_\mathrm{text} encapsulates actionable experience for future agent executions.

4.2 Agent RL Module

A RESTful API integration enables fully scalable RL training with distributed rollout and update. The framework utilizes:

  • Ray-based rollout collectors and hierarchical timeouts for concurrency and reliability
  • Safeguards: invalid tool calls filtered, off-policy updates reduced, advantage bias corrected
  • PPO-style updates:

LPG(θ)=Et[min(rt(θ)A^t,  clip(rt(θ),1ϵ,1+ϵ)A^t)],\mathcal{L}_{\mathrm{PG}}(\theta) = -\mathbb{E}_{t}\bigl[\min\bigl(r_t(\theta)\,\hat A_t,\;\mathrm{clip}(r_t(\theta),1-\epsilon,1+\epsilon)\,\hat A_t\bigr)\bigr],

where rt(θ)=πθ(atst)πθold(atst)r_t(\theta)=\frac{\pi_\theta(a_t|s_t)}{\pi_{\theta_\mathrm{old}}(a_t|s_t)} and A^t\hat A_t is the estimated advantage.

  • Value loss for the critic:

LV=Et[(Vϕ(st)Rt)2]\mathcal{L}_V = \mathbb{E}_t[(V_\phi(s_t)-R_t)^2]

with entropy bonuses to avoid policy collapse.

This hybrid optimization enables both lightweight, rapid improvement (Practice) and large-scale, robust policy evolution (RL).

5. Empirical Evaluation

Youtu-Agent demonstrates state-of-the-art performance across diverse benchmarks:

5.1 Benchmark Results

Benchmark Score Model/Approach
WebWalkerQA 71.47% pass@1 DeepSeek-V3 open model
GAIA (text-only) 72.8% pass@1 DeepSeek-V3 open model

5.2 Automated Generation Metrics (AgentGen-80)

Metric Workflow Meta-Agent
Config Validity (CV) 100% 98.75%
Tool Executability (TE) 81.25% 82.50%
Task Completion (TC) 65.00% 68.75%

5.3 Agent Practice (AIME 2024/2025 Mean@32)

  • Base ReAct: 80.0% / 67.9%
    • TF-GRPO: 82.7% (+2.7 pp) / 73.3% (+5.4 pp)
  • Learning cost ≈ \$18 versus \$10k–\$20k for standard RL

5.4 Agent RL (Qwen2.5-7B)

  • 40% iteration-time speedup vs. Agent-Lightning v0.2.2
  • AIME24/25 math/code: 0.10→0.45 (+35 pp), 0.09→0.31 (+22 pp)
  • Search/QA: e.g., TriviaQA +17 pp, PopQA +19 pp, HotpotQA +17 pp

These empirical results indicate both the reliability of automated agent synthesis and the efficacy of hybrid optimization in boosting agent performance efficiently.

6. Formalism and Theoretical Foundations

ΔθEt[θlogπθ(atst)A^t]\Delta\theta \propto \mathbb{E}_t\left[ \nabla_\theta \log\pi_\theta(a_t|s_t)\, \hat A_t \right]

with ratio clipping applied via rt(θ)r_t(\theta).

  • Group-Relative Advantage (TF-GRPO):

Ai=Q(τi)1Kj=1KQ(τj),with distilled textual update ΔtextA_i = Q(\tau_i) - \frac{1}{K}\sum_{j=1}^K Q(\tau_j),\quad\text{with distilled textual update }\Delta_\mathrm{text}

  • Architecture Visualization: The entire system is driven by a unified YAML schema, connecting the environment, tools, and agent/context manager layers.

Youtu-Agent’s modular, YAML-driven design and automated agent generation capabilities substantially lower the manual overhead in producing, deploying, and evolving LLM agents. Its dual modes of optimization—Agent Practice and scalable RL—enable both rapid feedback-driven improvement and full-scale end-to-end policy learning, compatible with open-weight models and commodity hardware.

In contrast with prior frameworks focusing on either static workflow simulation or manual tool integration, Youtu-Agent achieves high tool synthesis (>81%), robust task completion, and significantly accelerates RL-based learning. The integration of textual LoRA in in-context learning provides an efficient mechanism for non-gradient agent optimization.

This suggests the continued evolution of agentic frameworks toward automated, adaptive, and context-driven architectures tightly coupled with LLM advances.

Related agent-centric frameworks such as Youtu-GraphRAG extend these principles into complex graph-structured retrieval and reasoning (Dong et al., 27 Aug 2025), while lightweight LLMs tailored for agentic use, such as Youtu-LLM (Lu et al., 31 Dec 2025), further reinforce the upward trajectory of native agentic intelligence within LLM ecosystems.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Youtu-Agent Framework.