Youtu-Agent: Modular LLM Agent Framework

Updated 5 January 2026

Youtu-Agent is a modular framework that automates the synthesis, configuration, and continual optimization of LLM-based agents to reduce manual engineering overhead.
It utilizes a layered YAML-based structure separating Environment, Tools, and Agent layers to enhance composability and ease of deployment.
The framework integrates a hybrid optimization suite combining training-free in-context learning and reinforcement learning to achieve state-of-the-art performance on diverse benchmarks.

Youtu-Agent is a modular framework for the automated synthesis, configuration, and continual optimization of LLM-based agents. It addresses the persistent challenges in the agentic systems domain: prohibitive manual engineering overhead for configuration and tool integration, as well as brittle agent adaptability in the face of changing environments and requirements. Youtu-Agent employs a structured schema, enabling precise decoupling and reuse of execution environment, tool wrappers, and the agent planner. Two agent generation paradigms—Workflow and Meta-Agent—deliver automatic code, configuration, and prompt synthesis. A hybrid optimization suite combines in-context learning (“Agent Practice”) and distributed policy-gradient reinforcement learning (“Agent RL”). The framework achieves state-of-the-art results on tasks including web navigation (WebWalkerQA), general AI assistant benchmarks (GAIA), and mathematical/coding QA, all with open-weight models (Shi et al., 31 Dec 2025).

1. Layered Configuration and Execution System

Youtu-Agent’s architecture is organized into Environment, Tools, and Agent layers, each described through a human-readable YAML schema. This separation amplifies composability and automation in agent definition. The Environment Layer specifies the execution substrate (e.g., browser via Playwright, OS shell, sandboxed container), exposing state and primitive actions (HTML, filesystem operations, CLI commands). The Tools Layer bundles atomic operations into environment wrappers (DOM clicks, bash), utility functions (math, text), and MCP external services. All tools share a standardized interface for interchanging and synthesis.

The Agent Layer hosts the LLM-based planner operating in a perceive–reason–act (PRA) loop. Context Manager modules prune history (such as outdated HTML) to maintain task relevance and minimize token usage. The full configuration is orchestrated via YAML, allowing modular assembly and immediate deployment of new agent instances.

Layer	Functionality Example	Schema Element
Environment	Playwright, OS shell, sandbox	`env:`
Tools	DOM click, search, execute code	`toolkits:`
Agent (& Context Manager)	LLM planner, context pruning	`agent:`; `context_manager:`

2. Automated Agent Generation Paradigms

Youtu-Agent provides two LLM-driven paradigms for agent synthesis based on the desired complexity and adaptability:

Workflow Mode: For structured, well-defined tasks, the agent creation follows a four-stage process: (1) user intent clarification and technical decomposition; (2) toolkit retrieval or on-demand LLM-powered Python tool synthesis (generating signatures, docstrings, and unit tests); (3) prompt engineering—constructing system prompts linking tool descriptions, usage exemplars, and reasoning protocols; and (4) assembly of the final YAML configuration.

Meta-Agent Mode: For complex or ambiguous task specifications, an Architect Agent iteratively plans with dedicated tools (search_tool, create_tool, ask_user, create_agent_config). It alternates clarifying user intent, retrieving/synthesizing tools, and spawning the agent configuration. In the AgentGen-80 benchmark, Workflow Mode achieved 100% configuration validity, 81.25% tool executability, and 65.00% end-to-end completion; Meta-Agent Mode reached 98.75% CV, 82.50% TE, and 68.75% TC (Shi et al., 31 Dec 2025).

3. Hybrid Policy Optimization: Agent Practice & Agent RL

To tackle static agent capabilities, Youtu-Agent applies two distinct, complementary optimization strategies:

Agent Practice (Training-Free GRPO): The agent accumulates experience in “contextual memory” without model parameter updates. On each task, $k$ rollouts are generated, and an LLM evaluator distills semantic group advantage hints ( $\Delta_j$ ) by pairwise comparison: “focus on isolating variables early.” These textual experiences (acting as a “textual LoRA”) are prepended to agent prompts during subsequent inference for improved reasoning, yielding gains of up to +5.4 pp on challenging QA benchmarks (AIME25), with no gradient descent or fine-tuning.

Agent RL Module: End-to-end agent fine-tuning via policy gradient, with $J(\theta)=\mathbb{E}_{\tau\sim\pi_\theta}[R(\tau)]$ and unbiased gradient estimators. The infrastructure leverages RESTful service calls and Ray for distributed rollout collection on up to 128 GPUs. Hierarchical timeouts, anomaly filtering, and advantage bias correction suppress long-horizon entropy growth. Training achieves a 40% iteration speedup compared to Agent-Lightning, with substantial QA gains (+0.17 to +0.35 absolute on various datasets).

4. Benchmarks, Tool Synthesis, and Empirical Performance

Youtu-Agent demonstrates robust empirical results across diverse benchmarks:

WebWalkerQA (680 items): 71.47% pass@1 with open-weight models.
GAIA (text-only, 466 items): 72.8% pass@1.
Automated Tool Synthesis: Workflow Mode yields 81.25% tool executability; Meta-Agent 82.50%. End-to-end task rates are 65.00% and 68.75%, respectively.

Agent Practice module on AIME:

Method	AIME24 (%)	AIME25 (%)
ReAct (baseline)	80.0	67.9
+ GRPO (w/ GT)	82.7	73.3
+ GRPO (w/o GT)	80.7	68.9

Agent RL module (Qwen2.5-7B) improves Math/Code pass rates (e.g., AIME24 from 0.10 to 0.45 [+0.35]). Multi-Hop QA benchmarks register improvements of 0.08–0.21 absolute on PopQA, HotpotQA, 2WikiMultiHop, and others (Shi et al., 31 Dec 2025).

5. Strengths, Scalability, and Limitations

Key strengths are modularity—clear isolation of environment, tools, and agent logic; automated generation—reducing manual steps in agent engineering; and hybrid optimization—allowing continuous evolution from fast zero-update instruction tuning to full RL training.

Scalability in tool and environment support is assured by schema-driven synthesis, minimal engineering for new platforms, and distributed RL infrastructure for large-scale training. The agent’s contextual memory and layered RL underpin adaptation to evolving tasks.

Limitations include a small rate (1–2%) of synthesis failures in Meta-Agent mode, suggesting further schema validation or human-in-the-loop correction. Multimodal tool support (vision/audio) remains a future extension. The Agent Practice module’s effectiveness is currently maximized on small datasets, indicating curriculum learning and hierarchical experience distillation as prospective improvements.

6. Relationship with Vertically Integrated Agentic Paradigms

Youtu-Agent interfaces with advanced agentic frameworks such as Youtu-GraphRAG (Dong et al., 27 Aug 2025), which unifies graph-based schema extraction, hierarchical community detection, schema-grounded subquery decomposition, and multi-route retrieval for complex reasoning. Youtu-GraphRAG achieves token cost reductions up to 90.71% and improves QA accuracy by as much as 16.62% over state-of-the-art graph retrieval-augmented generation systems. Both frameworks foreground schema-enabled modularity and agentic iterative planning, with Youtu-GraphRAG primarily serving retrieval-augmented reasoning and Youtu-Agent specializing in automated agent generation and dynamic policy optimization.

7. Broader Implications and Future Directions

Youtu-Agent presents a paradigm for near-turnkey agentic ecosystems driven by structured schemas, automated LLM synthesis, and scalable optimization. A plausible implication is the feasibility of re-synthesizing agents as new tools or environments emerge with negligible manual intervention. Extending the modular approach to multimodal toolchains and validating agent synthesis by curriculum/experience distillation could further generalize agent evolution in non-stationary, complex domains. The combined innovation in schema-grounded automation and hybrid optimization positions Youtu-Agent as a foundational infrastructure for scalable LLM agent deployment and continuous productivity improvement (Shi et al., 31 Dec 2025).