Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 92 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 19 tok/s
GPT-5 High 18 tok/s Pro
GPT-4o 96 tok/s
GPT OSS 120B 473 tok/s Pro
Kimi K2 26 tok/s Pro
2000 character limit reached

AWorld: Orchestrating the Training Recipe for Agentic AI (2508.20404v1)

Published 28 Aug 2025 in cs.AI

Abstract: The learning from practice paradigm is crucial for developing capable Agentic AI systems, yet it is severely hampered by inefficient experience generation, a bottleneck especially pronounced in complex benchmarks like GAIA. To address this, we introduce AWorld, an open-source system engineered for large-scale agent-environment interaction. By distributing tasks across a cluster, AWorld accelerates experience collection by 14.6x compared to standard single-node, sequential execution. This critical speedup makes extensive reinforcement learning practical and scalable. Leveraging this capability, we trained a Qwen3-32B-based agent that significantly outperforms its base model, increasing its overall GAIA accuracy from 21.59% to 32.23%. On the benchmark's most challenging levels, our agent achieves a score of 16.33%, surpassing the performance of leading proprietary models. Our open-source system and resulting agent provide a practical blueprint for a complete agentic AI training pipeline, from efficient interaction to demonstrable model improvement.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces AWorld, which achieves a 14.6x faster rollout generation via distributed processing to enhance agentic AI performance on GAIA.
  • It integrates modular components including agent construction, communication protocols, and state management to streamline scalable reinforcement learning.
  • Empirical results demonstrate substantial improvements in model pass rates, underscoring efficient experience generation as critical for robust agentic learning.

AWorld: Orchestrating the Training Recipe for Agentic AI

Introduction

The paper presents AWorld, an open-source framework designed to address the computational bottlenecks in agentic AI training, particularly those arising from inefficient experience generation in complex environments. The work is motivated by the limitations of current LLMs in solving real-world, multi-step reasoning tasks, as exemplified by the GAIA benchmark. AWorld integrates model selection, runtime construction, communication protocols, and training orchestration to enable scalable, efficient agent-environment interaction and reinforcement learning. Figure 1

Figure 1

Figure 1: AWorld enables substantial performance gains on GAIA by accelerating experience generation 14.6x via distributed rollouts, making RL practical for large models.

Framework Architecture

AWorld is architected to support the full lifecycle of agentic AI development, from agent instantiation to distributed training. The framework is modular, comprising four principal components:

  • Agent Construction: Agents are instantiated with configurable logic, toolsets, and planning capabilities. The system supports prompt assembly, custom tool integration (e.g., browser automation, code execution), and flexible agent topologies for multi-agent collaboration.
  • Communication Protocols: A unified message-passing architecture enables robust communication between agents, tools, and environments. The Message API supports event-driven workflows, error handling, and extensibility for complex task coordination.
  • Runtime State Management: Distributed execution is managed via Kubernetes, allowing high-concurrency rollouts across sandboxed environments. State consistency is maintained through centralized trace servers and remote storage, supporting recovery and large-scale evaluation.
  • Training Orchestration: AWorld decouples rollout generation from model training, integrating with external RL frameworks (e.g., SWIFT, OpenRLHF) to enable scalable policy optimization. Figure 2

    Figure 2: The AWorld architecture supports both forward (experience generation) and backward (policy optimization) passes, enabling closed-loop agentic learning.

    Figure 3

    Figure 3: Message workflow in AWorld runtime, illustrating agent query handling and inter-component communication.

Distributed Rollout and Experience Generation

AWorld's core innovation lies in its distributed rollout engine, which orchestrates massive parallelization of agent-environment interactions. By leveraging Kubernetes, the system can concurrently execute thousands of long-horizon tasks, each encapsulated in isolated pods. This design overcomes the resource contention and instability inherent in single-node setups, making large-scale experience generation tractable. Figure 4

Figure 4: Massively parallel rollouts in AWorld, managed by Kubernetes, enable scalable data generation for RL.

Figure 5

Figure 5: Action-state rollout demonstration, with multiple pods running in parallel to maximize experience throughput.

Empirical Analysis: Rollout Scale and Agent Performance

The paper provides a systematic analysis of the relationship between rollout scale and agent performance on the GAIA benchmark. Experiments demonstrate a universal trend: increasing the number of rollouts per task leads to substantial improvements in pass rates for all evaluated models. For example, Claude-3.7-Sonnet's pass@1 increases from 47.9% to 76.4% as rollouts scale from 1 to 32, while GPT-4o's pass@1 more than doubles. Figure 6

Figure 6: Pass rate as a function of rollout scale on GAIA; all models benefit from increased interaction attempts.

This result empirically validates the necessity of efficient, large-scale experience generation for agentic learning. The bottleneck in agent training is not model optimization but the ability to generate sufficient successful trajectories for RL.

Efficiency Gains and Training Outcomes

AWorld achieves a 14.6x speedup in rollout generation compared to sequential single-node execution, reducing the total cycle time for rollout and training from 7839s to 669s. This efficiency shift enables practical RL on complex benchmarks.

Using AWorld, the authors fine-tune Qwen3-32B via SFT (on 886 successful trajectories) followed by RL (using GRPO and rule-based rewards). The resulting agent, Qwen3-32B-AWorld, achieves a pass@1 of 32.23% on GAIA, a 10.6 percentage point improvement over the base model. On the most challenging Level 3 questions, it attains 16.33%, outperforming all compared models, including GPT-4o and DeepSeek-V3. Notably, the agent also generalizes to xbench-DeepSearch, improving from 12% to 32% without direct training, indicating robust skill acquisition rather than overfitting.

Practical and Theoretical Implications

AWorld provides a scalable infrastructure for agentic AI, enabling efficient RL in environments with high resource demands and long-horizon tasks. The framework's modularity and integration capabilities position it as a practical solution for both single-agent and multi-agent systems. The empirical results challenge the notion that model size or architecture alone determines agentic performance; instead, rollout efficiency and experience diversity are critical.

Theoretically, the work underscores the importance of the "learning from practice" paradigm and the need for frameworks that optimize the entire agent-environment-training loop. The demonstrated generalization across benchmarks suggests that distributed RL with sufficient experience can yield agents with transferable reasoning capabilities.

Future Directions

The authors outline a roadmap for extending AWorld to support collective intelligence, expert societies of specialized agents, and autonomous self-improvement. Key research directions include multi-agent collaboration in heterogeneous environments, domain-specific agent specialization, and continuous, self-sustaining learning loops.

Conclusion

AWorld addresses the central bottleneck in agentic AI training by enabling efficient, distributed experience generation and seamless integration with RL frameworks. The framework's design and empirical validation establish rollout efficiency as the primary determinant of agentic performance on complex tasks. AWorld provides a practical blueprint for scalable agentic AI development and opens avenues for future research in collective and self-improving intelligence.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.