Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning

Published 21 Jun 2020 in cs.LG, cs.AI, and stat.ML | (2006.11751v2)

Abstract: Increasing the scale of reinforcement learning experiments has allowed researchers to achieve unprecedented results in both training sophisticated agents for video games, and in sim-to-real transfer for robotics. Typically such experiments rely on large distributed systems and require expensive hardware setups, limiting wider access to this exciting area of research. In this work we aim to solve this problem by optimizing the efficiency and resource utilization of reinforcement learning algorithms instead of relying on distributed computation. We present the "Sample Factory", a high-throughput training system optimized for a single-machine setting. Our architecture combines a highly efficient, asynchronous, GPU-based sampler with off-policy correction techniques, allowing us to achieve throughput higher than $10^5$ environment frames/second on non-trivial control problems in 3D without sacrificing sample efficiency. We extend Sample Factory to support self-play and population-based training and apply these techniques to train highly capable agents for a multiplayer first-person shooter game. The source code is available at https://github.com/alex-petrenko/sample-factory

Abstract PDF Upgrade to Chat

Citations (86)

View on Semantic Scholar

Summary

The paper presents Sample Factory, which leverages asynchronous PPO and double-buffered sampling to achieve up to 130,000 FPS on a single machine.
It optimizes RL resource use by asynchronously coordinating environment simulation, model inference, and backpropagation without heavy distributed systems.
The method supports multi-agent training and outperforms prior frameworks like IMPALA and SeedRL in throughput and scalability.

Asynchronous Reinforcement Learning in High-Throughput Environments: Sample Factory

The paper "Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning" presents a novel approach to optimizing the efficiency and resource utilization of reinforcement learning (RL) algorithms, eschewing the need for expansive distributed systems. This research addresses the substantial data hunger of traditional reinforcement learning methods by introducing a system that achieves throughput as high as 130,000 FPS on a single machine.

Methods and Architecture

Sample Factory is grounded on Asynchronous Proximal Policy Optimization (APPO), leveraging parallelism to optimize resource use. Three distinct computational workloads are identified: environment simulation, model inference, and backpropagation, each assigned to dedicated components that operate asynchronously. The architecture consists of rollout workers for environment simulation, policy workers for action generation, and learners for policy updates.

Double-Buffered Sampling is a key innovation allowing rollout workers to minimize idle time by alternating between two groups of environments. This approach ensures continuous simulation, with samplers optimized to maximize CPU and GPU resource utilization, achieving optimal performance.

Communication efficiency is achieved through shared memory and FIFO queues, avoiding the overhead of data serialization. This design ensures fast and efficient transfer of experience between components, critical for sustaining high throughput.

Results and Implications

The efficacy of Sample Factory is demonstrated across three environments: Atari, VizDoom, and DeepMind Lab. The architecture outperforms existing methods such as IMPALA, SeedRL, and PPO as implemented in the rlpyt framework, achieving peak throughput close to theoretical limits.

Sample Factory extends beyond single-policy and single-agent scenarios, supporting multi-agent environments and population-based training. This capacity is exemplified in the training of agents for the multiplayer game Doom, where agents are demonstrated to significantly outperform scripted in-game opponents.

Numerical Insights

The reported throughput across various environments underscores the system's efficiency. For example, the environment frames per second for VizDoom and DeepMind Lab achieve up to 146,551 and 41,781 FPS respectively in certain configurations, demonstrating substantial gains over conventional distributed methods.

Practical and Theoretical Implications

Practically, Sample Factory democratizes high-throughput RL, facilitating complex experiments on commodity hardware. The system allows researchers to conduct large-scale simulations without relying on costly distributed setups, expanding accessibility to state-of-the-art RL capabilities.

Theoretically, the paper emphasizes the importance of efficient experience collection and policy optimization. By minimizing policy lag and leveraging asynchronous techniques, the study contributes to ongoing discussions around sample efficiency in policy gradient methods.

Future Directions

While Sample Factory significantly enhances throughput, the potential for further optimization remains. Future developments could explore integrating multiple GPUs for data-parallel learning and further reducing the policy lag in highly complex environment scenarios.

In conclusion, Sample Factory represents a notable advancement in reinforcement learning systems, optimizing performance and accessibility. Its implications for both theoretical research and practical applications are profound, paving the way for more efficient and scalable RL methodologies.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (5)

Collections

GitHub

GitHub - alex-petrenko/sample-factory: High throughput synchronous and asynchronous reinforcement learning (830 stars)

Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning

Summary

Asynchronous Reinforcement Learning in High-Throughput Environments: Sample Factory

Methods and Architecture

Results and Implications

Numerical Insights

Practical and Theoretical Implications

Future Directions

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (5)

Collections

GitHub

Tweets