Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
164 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ViZDoom Competitions: Playing Doom from Pixels (1809.03470v1)

Published 10 Sep 2018 in cs.AI, cs.CV, cs.LG, and stat.ML

Abstract: This paper presents the first two editions of Visual Doom AI Competition, held in 2016 and 2017. The challenge was to create bots that compete in a multi-player deathmatch in a first-person shooter (FPS) game, Doom. The bots had to make their decisions based solely on visual information, i.e., a raw screen buffer. To play well, the bots needed to understand their surroundings, navigate, explore, and handle the opponents at the same time. These aspects, together with the competitive multi-agent aspect of the game, make the competition a unique platform for evaluating the state of the art reinforcement learning algorithms. The paper discusses the rules, solutions, results, and statistics that give insight into the agents' behaviors. Best-performing agents are described in more detail. The results of the competition lead to the conclusion that, although reinforcement learning can produce capable Doom bots, they still are not yet able to successfully compete against humans in this game. The paper also revisits the ViZDoom environment, which is a flexible, easy to use, and efficient 3D platform for research for vision-based reinforcement learning, based on a well-recognized first-person perspective game Doom.

Citations (113)

Summary

  • The paper demonstrates how AI agents overcome raw pixel input challenges to advance vision-based reinforcement learning in complex 3D environments.
  • It details the integration of state-of-the-art RL algorithms like A3C, DQN, and DRQN with techniques such as curriculum learning and auxiliary signals.
  • It highlights the platform’s efficacy by comparing agent performance on known versus unknown maps, emphasizing strategic adaptability and resource management.

An Analysis of the ViZDoom Competitions: Evaluating Reinforcement Learning Through FPS Gaming

The paper in question discusses the first two editions of the Visual Doom AI Competition (VDAIC) held in 2016 and 2017. The competitions aimed to advance the field of vision-based reinforcement learning (RL) by using the FPS game Doom as a platform. The primary goal was for AI agents to play Doom using only raw pixel data as input, thereby necessitating the development of sophisticated perception and decision-making capabilities.

Competition Overview

The challenges were structured as multi-player deathmatches where the submitted agents competed against one another. The task imposed rigorous constraints; particularly, agents could not access any game data aside from what was visible on the screen. This required them to navigate the game's 3D environment, strategize, explore, and combat opponents with only visual cues.

Overall, the competitions showcased a range of RL approaches, utilizing algorithms such as A3C, DQN, DRQN, and DFP. The bots employed both traditional RL methods and novel integrated learning strategies, including curriculum learning and the use of auxiliary signals to cope with sparse rewards.

Results and Observations

Track 1 and Track 2: The competitions featured two distinct tracks. Track 1 bots executed a known map scenario, while Track 2 presented the challenge of unknown maps, significantly increasing the difficulty as agents could not memorize layouts. Interestingly, Track 1 results showed a tight leaderboard with agents like Marvin and Arnold2 leading due to efficient resource management and high detection precision.

In contrast, Track 2 required strategic adaptability and robust navigation skills on unseen maps. Although IntelAct, a previous winner, performed well, Arnold4 demonstrated superior accuracy and strategy, achieving the highest standing in 2017.

Key Algorithms

The top-performing submissions harnessed state-of-the-art RL algorithms:

  • Marvin utilized A3C in conjunction with human demonstration data to pre-train its models.
  • Arnold2/Arnold4 employed a combination of DQN and DRQN with enhancements such as strafing and avoidance strategies.
  • YanShi integrated perception with high-level planning using a combination of RPN, SLAM, MCTS, and pre-trained models.

Insights and Implications

While the competition underscored notable strides in vision-based RL, it also highlighted the gap between AI and human performance in complex 3D environments. The inherent difficulties of perception from raw pixels persisted, particularly in tasks like vertical aiming and strategic navigation.

These findings imply that future research should focus on developing advanced perception modules, possibly integrating unsupervised learning and novel network architectures to handle dynamic 3D scenes more effectively.

Platform Evaluation

The ViZDoom platform itself, based on the ZDoom engine, proved to be a robust tool for RL research. Its ability to simulate a realistic gaming environment with flexible API support enables diverse experimentation. It complements other platforms like DeepMind Lab and Unity ML-Agents by providing a lightweight yet complex 3D scenario for training and evaluation.

Conclusion

The VDAIC competitions served as a significant milestone in the exploration of RL in FPS games, pushing the boundaries of what AI can achieve in visually demanding environments. Despite advancements, achieving human-level proficiency remains an open research question. The introduction of more complex tasks, such as completing original Doom levels, is likely to drive further innovation in the field. As AI systems continue to evolve, the lessons from ViZDoom competitions will undoubtedly inform future developments in autonomous agents capable of navigating 3D worlds from visual input alone.

Github Logo Streamline Icon: https://streamlinehq.com