Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning (1605.02097v2)

Published 6 May 2016 in cs.LG, cs.AI, and cs.CV

Abstract: The recent advances in deep neural networks have led to effective vision-based reinforcement learning methods that have been employed to obtain human-level controllers in Atari 2600 games from pixel data. Atari 2600 games, however, do not resemble real-world tasks since they involve non-realistic 2D environments and the third-person perspective. Here, we propose a novel test-bed platform for reinforcement learning research from raw visual information which employs the first-person perspective in a semi-realistic 3D world. The software, called ViZDoom, is based on the classical first-person shooter video game, Doom. It allows developing bots that play the game using the screen buffer. ViZDoom is lightweight, fast, and highly customizable via a convenient mechanism of user scenarios. In the experimental part, we test the environment by trying to learn bots for two scenarios: a basic move-and-shoot task and a more complex maze-navigation problem. Using convolutional deep neural networks with Q-learning and experience replay, for both scenarios, we were able to train competent bots, which exhibit human-like behaviors. The results confirm the utility of ViZDoom as an AI research platform and imply that visual reinforcement learning in 3D realistic first-person perspective environments is feasible.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Michał Kempka (3 papers)
  2. Marek Wydmuch (14 papers)
  3. Grzegorz Runc (1 paper)
  4. Jakub Toczek (1 paper)
  5. Wojciech Jaśkowski (10 papers)
Citations (671)

Summary

  • The paper introduces a novel 3D environment for visual reinforcement learning that overcomes the limitations of 2D Atari benchmarks.
  • The paper details experiments using deep Q-learning for both move-and-shoot and complex maze navigation tasks, achieving efficient learning and human-like spatial reasoning.
  • The results imply that ViZDoom paves the way for advanced AI agents in robotics and interactive systems, with potential for multi-agent and audio-enhanced future research.

ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning

The paper presents "ViZDoom," a software platform leveraging the classic first-person shooter game Doom as a test-bed for visual reinforcement learning (RL) research. The authors address the limitations of existing benchmarks like Atari 2600 games which offer simplistic, non-realistic 2D environments that do not effectively transfer to real-world tasks, particularly those involving first-person perspective and 3D environments.

Contributions

ViZDoom introduces a novel testing environment for RL, allowing the development of bots that interact with a semi-realistic 3D world from a first-person perspective. Key features include:

  • Realistic 3D Interaction: Unlike Atari games, ViZDoom provides a more complex, realistic environment with 3D physics, offering a closer approximation to real-world tasks.
  • High Customizability: Users can define custom scenarios through a flexible API, altering maps, non-player characters, rewards, goals, and actions.
  • Performance Efficiency: The platform is lightweight and high-performance, running nearly 7000 frames per second on contemporary hardware, facilitating extensive experiments.

Experiments

Two experiments were conducted to demonstrate ViZDoom's effectiveness as an AI research platform:

  1. Basic Task - Move and Shoot: Using a deep Q-learning approach with convolutional neural networks, a bot was trained to pursue and shoot targets effectively. Various frame skip rates were evaluated, revealing that an optimal rate between 4-10 frames led to faster, smoother learning and more competent bot behavior.
  2. Complex Maze Navigation: In a more complex scenario, the bot navigated a 3D maze to collect items and avoid obstacles, demonstrating substantial spatial reasoning. The bot exhibited human-like navigation strategies, despite occasional inefficiencies.

Implications

The results imply that visual RL in realistic 3D environments is feasible, with ViZDoom providing a robust framework for such experimentation. The platform opens avenues for exploring complex behaviors and tactics in AI agents, with potential implications for robotics and interactive AI systems trained from raw visual inputs.

Theoretical and Practical Impact

The development of ViZDoom marks a notable advancement in creating more lifelike and complex RL environments, bridging the gap between abstract virtual simulations and tangible real-world applications. Future research could expand ViZDoom to include synchronous multiplayer modes and audio processing capabilities, enhancing its utility for broader AI challenges.

Future Directions

Further theoretical exploration could investigate adaptive learning rates and multi-agent interactions within ViZDoom. These enhancements could refine AI models trained to navigate and adapt within dynamic and uncertain environments, relevant to both autonomous systems and interactive game development.

ViZDoom, by offering a controllable, rich visual environment, stands to significantly contribute to the evolution of RL research, helping to develop algorithms that seamlessly transition to practical AI applications.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com