DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning (2106.06135v1)

Published 11 Jun 2021 in cs.AI and cs.LG

Abstract: Games are abstractions of the real world, where artificial agents learn to compete and cooperate with other agents. While significant achievements have been made in various perfect- and imperfect-information games, DouDizhu (a.k.a. Fighting the Landlord), a three-player card game, is still unsolved. DouDizhu is a very challenging domain with competition, collaboration, imperfect information, large state space, and particularly a massive set of possible actions where the legal actions vary significantly from turn to turn. Unfortunately, modern reinforcement learning algorithms mainly focus on simple and small action spaces, and not surprisingly, are shown not to make satisfactory progress in DouDizhu. In this work, we propose a conceptually simple yet effective DouDizhu AI system, namely DouZero, which enhances traditional Monte-Carlo methods with deep neural networks, action encoding, and parallel actors. Starting from scratch in a single server with four GPUs, DouZero outperformed all the existing DouDizhu AI programs in days of training and was ranked the first in the Botzone leaderboard among 344 AI agents. Through building DouZero, we show that classic Monte-Carlo methods can be made to deliver strong results in a hard domain with a complex action space. The code and an online demo are released at https://github.com/kwai/DouZero with the hope that this insight could motivate future work.

Authors (7)

Daochen Zha (56 papers)
Jingru Xie (3 papers)
Wenye Ma (7 papers)
Sheng Zhang (212 papers)
Xiangru Lian (18 papers)
Xia Hu (186 papers)
Ji Liu (285 papers)

Citations (105)

View on Semantic Scholar

Summary

Overview of DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning

The paper "DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning" addresses the challenging task of developing a strong AI system for DouDizhu, a complex multi-agent card game characterized by imperfect information, large state/action spaces, and a mix of competition and cooperation. The proposed solution, DouZero, enhances Monte-Carlo methods with deep reinforcement learning techniques, demonstrating the potential to achieve strong performance without relying on human-derived abstractions of the game.

Methodology

The authors employ a Deep Monte-Carlo (DMC) approach that integrates Monte-Carlo methods with deep neural networks. This enables efficient exploration of the extensive action space inherent in DouDizhu. The reinforcement learning model efficiently represents states and actions using card matrices and leverages LSTM networks for historical moves. The approach eschews traditional search and extensive domain-specific knowledge, focusing instead on leveraging the scalability of deep learning.

Key features of DouZero include:

Action Representation: Actions are encoded using a one-hot matrix, facilitating generalization over infrequently seen actions.
Neural Architecture: Incorporates deep neural networks with LSTM for encoding sequences of moves.
Parallel Actors: Utilizes multiple actor processes, each maintaining local networks to parallelize learning and accelerate training.

Numerical Results

DouZero was evaluated against a range of existing DouDizhu AI programs, including DeltaDou and rule-based heuristics, and exhibited superior performance. Importantly, DouZero achieved this with significantly fewer computational resources, training from scratch on a single server over a span of days. The model demonstrated an ability to outperform DeltaDou, previously considered the strongest AI, on both winning percentage (WP) and average difference in points (ADP).

The paper also reports competitive results on the Botzone leaderboard, securing top positions among a field populated by various AI agents. Such results highlight DouZero’s robustness and adaptability across different settings.

Implications and Future Directions

The success of DouZero indicates that Monte-Carlo methods, when augmented with deep reinforcement learning, can tackle large-scale complex imperfect-information games effectively. The paper underscores the inadequacy of approaches overly dependent on human knowledge and points towards the utilization of scalable learning systems that can adapt to complex environments.

For future developments in AI, particularly in multi-agent and imperfect-information domains, the insights from DouZero suggest a few potential directions:

Enhanced scalability and training efficiency by leveraging advanced hardware and distributed computing.
Integration of bidding phase logic directly into the reinforcement learning pipeline for a more holistic approach.
Exploration of hybrid models that combine deep learning with search-based methods during both training and inference to optimize strategic depth.

Overall, DouZero represents a meaningful step toward developing AI systems capable of thriving in complex, dynamic domains without the necessity for extensive pre-built abstractions, offering a compelling baseline for future research in multi-agent reinforcement learning.

PDF Markdown

Related Papers

Find Related Papers

GitHub

YouTube

Show All Videos