A Survey of Deep Reinforcement Learning in Video Games (1912.10944v2)

Published 23 Dec 2019 in cs.MA, cs.AI, and cs.LG

Abstract: Deep reinforcement learning (DRL) has made great achievements since proposed. Generally, DRL agents receive high-dimensional inputs at each step, and make actions according to deep-neural-network-based policies. This learning mechanism updates the policy to maximize the return with an end-to-end method. In this paper, we survey the progress of DRL methods, including value-based, policy gradient, and model-based algorithms, and compare their main techniques and properties. Besides, DRL plays an important role in game AI. We also take a review of the achievements of DRL in various video games, including classical Arcade games, first-person perspective games and multi-agent real-time strategy games, from 2D to 3D, and from single-agent to multi-agent. A large number of video game AIs with DRL have achieved super-human performance, while there are still some challenges in this domain. Therefore, we also discuss some key points when applying DRL methods to this field, including exploration-exploitation, sample efficiency, generalization and transfer, multi-agent learning, imperfect information, and delayed spare rewards, as well as some research directions.

PDF Abstract

Deep Reinforcement Learning in Video Gaming: An Analytical Overview

The paper, "A Survey of Deep Reinforcement Learning in Video Games," provides a comprehensive review of deep reinforcement learning (DRL) applications in video games, highlighting the significant strides DRL has made in bridging the gap between high-dimensional inputs and decision-making actions in dynamic environments. As deep reinforcement learning (DRL) integrates deep learning (DL) with reinforcement learning (RL), it empowers agents to learn effective strategies directly from high-dimensional sensory inputs, a capability leveraged demonstrably in video game environments which mimic real-world complexities.

DRL Methodologies and Their Applications

The paper categorizes DRL methodologies into value-based, policy gradient, and model-based algorithms. Value-based methods, such as Deep Q-Network (DQN) and its derivatives, make use of neural networks to approximate value functions, highlighting advancements such as Double DQN, Dueling DQN, and distributional DRL models like C51 and QR-DQN that address overestimation bias and better capture the distribution over returns.

Policy gradient methods evolve around directly parameterizing policies instead of value functions. As outlined, architectures like Asynchronous Advantage Actor-Critic (A3C) and its successors (e.g., IMPALA) demonstrate enhanced scalability and efficiency in learning policies within complex environments. The implementation of trust region methods such as TRPO and PPO signifies crucial improvements by ensuring stable updates—key to handling the sensitivity of policy gradients to step sizes.

Model-based DRL, though less emphasized in video gaming, is marked by innovative uses like MuZero, which combines model-based planning with DRL without knowing the environment's dynamics, showing promising results in complex strategic games.

Achievements and Challenges in Video Game Implementations

In applying these methods to video games, the paper underscores DRL's outstanding achievements across various genres and complexities—from mastering 2D arcade games to navigating 3D multiplayer strategy games. Atari games serve as a longstanding benchmark for testing DRL models, with algorithms achieving superhuman performance on multiple titles. Despite these successes, some games like Montezuma’s Revenge still pose challenges due to sparse rewards and exploration difficulties.

The exploration-exploitation dilemma highlights a critical research area for improving DRL's real-world applicability. Approaches involving parametric noise and count-based exploration seek to refine the balance but often require computationally expensive environments or additional complexity in counteracting unstable learning.

Furthermore, the challenge of sample efficiency—critical in scenarios where data collection is costly—remains a barrier, particularly in real-time strategy (RTS) games or first-person shooters (FPS) where real-world dynamics are more accurately simulated. Hierarchical reinforcement learning presents a promising direction by decomposing goals into manageable sub-tasks, enhancing sample efficiency through structured learning.

Future Implications for AI Development

This survey articulates DRL’s capacity to generalize learning across different game environments, a crucial aspect for transferring and scaling AI applications beyond gaming. The insights from mastering imperfect information scenarios in gaming contribute profoundly to AI systems' development that can interact and learn in uncertain real-world environments. Multi-agent learning, a field ripe for exploration as evidenced in DRL-driven strategies in StarCraft and MOBA games, lays foundational groundwork for collaborative AI systems.

As DRL continues to evolve, driven by insights from game AI, additional focus on overcoming technical challenges such as delayed reward structures and designing robust frameworks for multi-agent cooperation will be critical. The paper posits these navigational paths for future research, indicative of DRL's broader impacts on both theoretical AI understanding and practical application potential, ultimately furthering the trajectory toward developing intelligent systems with human-like adaptability and learning capacity.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Kun Shao (29 papers)
Zhentao Tang (6 papers)
Yuanheng Zhu (17 papers)
Nannan Li (35 papers)
Dongbin Zhao (62 papers)

Citations (171)

View on Semantic Scholar

A Survey of Deep Reinforcement Learning in Video Games (1912.10944v2)

Deep Reinforcement Learning in Video Gaming: An Analytical Overview

DRL Methodologies and Their Applications

Achievements and Challenges in Video Game Implementations

Future Implications for AI Development

Related Papers