POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning (2405.08036v4)
Abstract: Value function factorization methods are commonly used in cooperative multi-agent reinforcement learning, with QMIX receiving significant attention. Many QMIX-based methods introduce monotonicity constraints between the joint action value and individual action values to achieve decentralized execution. However, such constraints limit the representation capacity of value factorization, restricting the joint action values it can represent and hindering the learning of the optimal policy. To address this challenge, we propose the Potentially Optimal Joint Actions Weighted QMIX (POWQMIX) algorithm, which recognizes the potentially optimal joint actions and assigns higher weights to the corresponding losses of these joint actions during training. We theoretically prove that with such a weighted training approach the optimal policy is guaranteed to be recovered. Experiments in matrix games, difficulty-enhanced predator-prey, and StarCraft II Multi-Agent Challenge environments demonstrate that our algorithm outperforms the state-of-the-art value-based multi-agent reinforcement learning methods.
- Counterfactual multi-agent policy gradients. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
- Rethinking the implementation tricks and monotonicity constraint in cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2102.03479, 2021.
- A multi-agent reinforcement learning method for swarm robots in space collaborative exploration. In 2020 6th international conference on control, automation and robotics (ICCAR), pages 139–144. IEEE, 2020.
- Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30, 2017.
- Maven: Multi-agent variational exploration. Advances in neural information processing systems, 32, 2019.
- Remix: Regret minimization for monotonic value function factorization in multiagent reinforcement learning. arXiv preprint arXiv:2302.05593, 2023.
- A concise introduction to decentralized POMDPs, volume 1. Springer, 2016.
- Weighted qmix: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. Advances in neural information processing systems, 33:10199–10210, 2020a.
- Monotonic value function factorisation for deep multi-agent reinforcement learning. The Journal of Machine Learning Research, 21(1):7234–7284, 2020b.
- The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043, 2019.
- An introduction to multi-agent reinforcement learning and review of its application to autonomous mobility. In 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), pages 1342–1349. IEEE, 2022.
- Resq: A residual q function-based approach for multi-agent reinforcement learning value factorization. Advances in Neural Information Processing Systems, 35:5471–5483, 2022.
- Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International conference on machine learning, pages 5887–5896. PMLR, 2019.
- Qtran++: improved value transformation for cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2006.12010, 2020.
- Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296, 2017.
- Pettingzoo: Gym for multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 34:15032–15043, 2021.
- Greedy-based value representation for optimal coordination in multi-agent reinforcement learning. arXiv preprint arXiv:2112.04454, 2021.
- Qplex: Duplex dueling multi-agent q-learning. arXiv preprint arXiv:2008.01062, 2020.
- Dual self-awareness value decomposition framework without individual global max for cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2302.02180, 2023.
- Fop: Factorizing optimal joint policy of maximum-entropy multi-agent reinforcement learning. In International Conference on Machine Learning, pages 12491–12500. PMLR, 2021.
- Learning implicit credit assignment for cooperative multi-agent reinforcement learning. Advances in neural information processing systems, 33:11853–11864, 2020.
- Chang Huang (46 papers)
- Junqiao Zhao (32 papers)
- Shatong Zhu (2 papers)
- Hongtu Zhou (5 papers)
- Chen Ye (35 papers)
- Tiantian Feng (61 papers)
- Changjun Jiang (47 papers)