Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Pgx: Hardware-Accelerated Parallel Game Simulators for Reinforcement Learning (2303.17503v4)

Published 29 Mar 2023 in cs.AI and cs.LG

Abstract: We propose Pgx, a suite of board game reinforcement learning (RL) environments written in JAX and optimized for GPU/TPU accelerators. By leveraging JAX's auto-vectorization and parallelization over accelerators, Pgx can efficiently scale to thousands of simultaneous simulations over accelerators. In our experiments on a DGX-A100 workstation, we discovered that Pgx can simulate RL environments 10-100x faster than existing implementations available in Python. Pgx includes RL environments commonly used as benchmarks in RL research, such as backgammon, chess, shogi, and Go. Additionally, Pgx offers miniature game sets and baseline models to facilitate rapid research cycles. We demonstrate the efficient training of the Gumbel AlphaZero algorithm with Pgx environments. Overall, Pgx provides high-performance environment simulators for researchers to accelerate their RL experiments. Pgx is available at http://github.com/sotetsuk/pgx.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Sotetsu Koyamada (8 papers)
  2. Shinri Okano (1 paper)
  3. Soichiro Nishimori (6 papers)
  4. Yu Murata (1 paper)
  5. Keigo Habara (2 papers)
  6. Haruka Kita (2 papers)
  7. Shin Ishii (18 papers)
Citations (17)

Summary

Overview of "Pgx: Hardware-Accelerated Parallel Game Simulators for Reinforcement Learning"

The paper introduces Pgx, a highly optimized suite of board game environments for reinforcement learning (RL) research, implemented in JAX and engineered to leverage GPU/TPU accelerators. The design framework exploits JAX's auto-vectorization and parallelization, enabling substantial scaling capacities to perform thousands of simulations concurrently on hardware accelerators. This approach yields a significant performance enhancement, evidenced by experimental results demonstrating simulation speeds 10-100 times faster than traditional Python-based implementations.

Pgx incorporates a comprehensive set of board game RL environments, including well-established benchmarks like backgammon, chess, shogi, and Go. These environments have been pivotal in AI research focusing on RL and search algorithms in discrete state spaces. The paper also highlights the successful training execution of the Gumbel AlphaZero algorithm using Pgx environments, underscoring Pgx's capability to support efficient RL experimentations.

Key Contributions and Results

  • Performance Benchmarking: The authors present a detailed performance analysis, showcasing Pgx's superior simulation throughput compared to existing Python libraries such as PettingZoo and OpenSpiel. Specifically, leveraging the DGX-A100 workstation, Pgx demonstrated throughput enhancements 10-100x greater under high-batch settings.
  • Game Variety and Scalability: Pgx offers over 20 games, spanning perfect information games like chess, to imperfect information domains like bridge, bolstering RL research capabilities. This range also includes miniature environments (e.g., miniature chess), enabling accelerated research cycles and facilitating lower computational demands.
  • Baseline Models and Training Validation: To facilitate agent evaluation in multi-agent environments, Pgx includes baseline models, demonstrated through successful AlphaZero training across multiple games. The robustness and efficiency of Pgx's game environments are validated through these training experiments, revealing that training scalability significantly benefits by distributing across multiple GPUs.

Implications and Future Directions

The introduction of Pgx into the RL research community carries notable implications. Firstly, it reduces the gap between simulation speed for complex games and real-time policy optimization, catalyzing a more efficient evaluation of RL algorithms. Additionally, by accommodating hardware accelerators seamlessly through JAX, Pgx eliminates traditional CPU-GPU data transfer bottlenecks, allowing researchers to maximize utilization of their computational resources.

Potential future developments for Pgx are promising. These include increasing the variety of supported game types, specifically adding more imperfect information games, incorporating stronger baseline models for broader game environments, and ensuring compatibility across diverse accelerator architectures such as TPUs. Enhancements in compatibility and enriched game type diversity would further augment Pgx’s applicability and accessibility for research in RL and game AI.

Overall, Pgx represents a significant advancement in the availability and efficiency of environment simulators for RL research, facilitating more rapid algorithmic development and providing the computational community with an open-source, high-performance toolkit.