Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning (2003.08839v2)

Published 19 Mar 2020 in cs.LG, cs.MA, and stat.ML

Abstract: In many real-world settings, a team of agents must coordinate its behaviour while acting in a decentralised fashion. At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a mixing network that estimates joint action-values as a monotonic combination of per-agent values. We structurally enforce that the joint-action value is monotonic in the per-agent values, through the use of non-negative weights in the mixing network, which guarantees consistency between the centralised and decentralised policies. To evaluate the performance of QMIX, we propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning. We evaluate QMIX on a challenging set of SMAC scenarios and show that it significantly outperforms existing multi-agent reinforcement learning methods.

PDF Abstract

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

The paper introduces QMIX, a method designed to address challenges in deep multi-agent reinforcement learning (MARL) where agents must learn decentralised policies with the advantage of centralised training. It proposes a novel value-based approach to factorising value functions using monotonic constraints, aiming to efficiently extract decentralised policies from centralised learning.

Key Elements of QMIX

QMIX leverages a mixing network to calculate joint action-values as a monotonic function of individual agent values. The monotonic nature, ensured through non-negative weights, preserves consistency between centralised joint-action and decentralised per-agent actions. This allows seamless extraction of decentralised policies while preserving the robustness of centralised value function learning.

Experimental Setup and Evaluation

To assess QMIX, the authors introduce the StarCraft Multi-Agent Challenge (SMAC), a benchmark that simulates diverse and complex real-time strategy game scenarios. The benchmark evaluates MARL algorithms on various aspects such as partial observability, coordination, and scalability.

The experimental results indicate that QMIX significantly outperforms existing MARL methods, including Independent Q-Learning (IQL), Value Decomposition Networks (VDN), and COMA, across multiple SMAC scenarios. Notably, the ability of QMIX to represent richer classes of action-value functions contributes substantially to its superior performance.

Implications and Future Directions

QMIX's approach offers significant improvements in learning decentralised policies without compromising the centralised action-value function's depth and accuracy. The method presents a scalable solution, especially in environments with a large number of agents, by effectively managing coordination and decision-making complexities.

Future research directions may explore further extending QMIX's architecture to accommodate environments with continuous action spaces and more complex coordination requirements. Additionally, enhancing exploration strategies and further optimising the representation of non-linear value functions could contribute to advancements in MARL.

In conclusion, QMIX sets a substantial precedent for developing multi-agent systems, achieving efficient decentralisation with robust centralised policy training, crucial for practical applications where decentralisation is required but centralised training is feasible. The introduction of SMAC further establishes a benchmark for the progression of MARL methodologies.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Tabish Rashid (16 papers)
Mikayel Samvelyan (22 papers)
Christian Schroeder de Witt (49 papers)
Gregory Farquhar (21 papers)
Jakob Foerster (100 papers)
Shimon Whiteson (122 papers)

Citations (616)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos