Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning (2006.10800v2)

Published 18 Jun 2020 in cs.LG, cs.MA, and stat.ML

Abstract: QMIX is a popular $Q$-learning algorithm for cooperative MARL in the centralised training and decentralised execution paradigm. In order to enable easy decentralisation, QMIX restricts the joint action $Q$-values it can represent to be a monotonic mixing of each agent's utilities. However, this restriction prevents it from representing value functions in which an agent's ordering over its actions can depend on other agents' actions. To analyse this representational limitation, we first formalise the objective QMIX optimises, which allows us to view QMIX as an operator that first computes the $Q$-learning targets and then projects them into the space representable by QMIX. This projection returns a representable $Q$-value that minimises the unweighted squared error across all joint actions. We show in particular that this projection can fail to recover the optimal policy even with access to $Q^*$, which primarily stems from the equal weighting placed on each joint action. We rectify this by introducing a weighting into the projection, in order to place more importance on the better joint actions. We propose two weighting schemes and prove that they recover the correct maximal action for any joint action $Q$-values, and therefore for $Q^*$ as well. Based on our analysis and results in the tabular setting, we introduce two scalable versions of our algorithm, Centrally-Weighted (CW) QMIX and Optimistically-Weighted (OW) QMIX and demonstrate improved performance on both predator-prey and challenging multi-agent StarCraft benchmark tasks.

PDF Abstract

An Analysis of Weighted QMIX for Deep Multi-Agent Reinforcement Learning

The paper introduces an enhanced version of the QMIX algorithm, termed Weighted QMIX, for cooperative Multi-Agent Reinforcement Learning (MARL) in environments where joint action selection is constrained by decentralisation requirements. The proposed algorithm addresses the known limitations of QMIX, particularly its inability to represent value functions that require coordination among agents. In doing so, Weighted QMIX demonstrates improved performance on both synthetic and complex environments, such as StarCraft II scenarios.

Context and Motivation

QMIX is a prevalent algorithm within the centralised training with decentralised execution framework, known for its success in environments with multiple interacting agents. It represents joint action value functions using a monotonic mixing of individual agent utilities. The monotonic constraint streamlines the integration of agent-wise utilities but imposes a significant limitation—QMIX cannot model interactions in which an agent’s optimal action depends intricately on the actions of others, thereby hindering effective task coordination.

Proposed Enhancements

Weighted QMIX introduces the concept of weighted projections into the representable function space of QMIX. It identifies that equal weighting in QMIX's optimisation framework ascribes unwarranted importance to suboptimal joint actions. The core contribution of the paper lies in the introduction of two weighting schemes: Centrally-Weighted QMIX (CW-QMIX) and Optimistically-Weighted QMIX (OW-QMIX). These schemes intelligently prioritise more promising joint actions through weighted representation, mitigating the disadvantage of equal importance in QMIX’s traditional approach.

Centrally-Weighted QMIX (CW-QMIX) adjusts the weight of each action based on its proximity to the estimated optimal action, allowing the algorithm to focus learning on actions close to optimal.
Optimistically-Weighted QMIX (OW-QMIX) provides a weighting scheme that emphasises actions potentially undervalued by the current value estimates, adopting an optimistic approach that facilitates exploration of promising actions.

Theoretical guarantees provide that these schemes accurately capture the maximal action of a state for any joint action value function, ensuring convergence to the optimal policy.

Results and Implications

In practical implementation, Weighted QMIX demonstrates superior performance on environments such as Predator-Prey and several StarCraft Multi-Agent Challenge (SMAC) tasks. Results indicate that both CW-QMIX and OW-QMIX can effectively tackle scenarios where QMIX fails, specifically where high coordination is necessary and exploration spaces are vast.

The ability of Weighted QMIX to outperform QMIX under increased exploration settings is particularly noteworthy. It underscores the algorithm’s robustness and adaptability to exploration variance—an aspect often challenging for conventional MARL methods.

Future Directions

The insights from Weighted QMIX suggest a promising avenue for future MARL research direction—focusing on dynamic importance weighting of joint actions. Further work could explore more refined weightings or even adaptive schemes that adjust based on environment dynamics to fully realise the potential of the weighted projection methodology.

Moreover, examining the interplay between the architectural complexity of function approximators and the performance limitations observed would provide further clarity in addressing the identified limitations of this approach. Ultimately, this line of research contributes to more robust, generalisable MARL solutions, advancing their practical applicability, particularly in real-world multi-agent systems like autonomous driving and cooperative robotics.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Tabish Rashid (16 papers)
Gregory Farquhar (21 papers)
Bei Peng (34 papers)
Shimon Whiteson (122 papers)

Citations (311)

View on Semantic Scholar