JointPPO: Diving Deeper into the Effectiveness of PPO in Multi-Agent Reinforcement Learning (2404.11831v2)
Abstract: While Centralized Training with Decentralized Execution (CTDE) has become the prevailing paradigm in Multi-Agent Reinforcement Learning (MARL), it may not be suitable for scenarios in which agents can fully communicate and share observations with each other. Fully centralized methods, also know as Centralized Training with Centralized Execution (CTCE) methods, can fully utilize observations of all the agents by treating the entire system as a single agent. However, traditional CTCE methods suffer from scalability issues due to the exponential growth of the joint action space. To address these challenges, in this paper we propose JointPPO, a CTCE method that uses Proximal Policy Optimization (PPO) to directly optimize the joint policy of the multi-agent system. JointPPO decomposes the joint policy into conditional probabilities, transforming the decision-making process into a sequence generation task. A Transformer-based joint policy network is constructed, trained with a PPO loss tailored for the joint policy. JointPPO effectively handles a large joint action space and extends PPO to multi-agent setting in a clear and concise manner. Extensive experiments on the StarCraft Multi-Agent Challenge (SMAC) testbed demonstrate the superiority of JointPPO over strong baselines. Ablation experiments and analyses are conducted to explores the factors influencing JointPPO's performance.
- Dimitri Bertsekas. Multiagent reinforcement learning: Rollout and policy iteration. IEEE/CAA Journal of Automatica Sinica, 8(2):249–272, 2021.
- Commander-soldiers reinforcement learning for cooperative multi-agent systems. In 2022 International Joint Conference on Neural Networks (IJCNN), pages 1–7. IEEE, 2022.
- Is independent learning all you need in the starcraft multi-agent challenge? arXiv preprint arXiv:2011.09533, 2020.
- Multi-agent systems: A survey. Ieee Access, 6:28573–28593, 2018.
- Multiagent reinforcement learning with heterogeneous graph attention network. IEEE Transactions on Neural Networks and Learning Systems, 2022.
- Counterfactual multi-agent policy gradients. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
- Cooperative multi-robot navigation in dynamic environment with deep reinforcement learning. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 448–454. IEEE, 2020.
- Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2):99–134, 1998.
- Approximately optimal approximate reinforcement learning. In Proceedings of the Nineteenth International Conference on Machine Learning, pages 267–274, 2002.
- Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190:82–94, 2016.
- Trust region policy optimisation in multi-agent reinforcement learning. arXiv preprint arXiv:2109.11251, 2021.
- Heterogeneous-agent mirror learning: A continuum of solutions to cooperative marl. arXiv preprint arXiv:2208.01682, 2022.
- A multiagent approach to q𝑞qitalic_q-learning for daily stock trading. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 37(6):864–877, 2007.
- Dealing with non-stationarity in marl via trust-region decomposition. arXiv preprint arXiv:2102.10616, 2021.
- Ace: Cooperative multi-agent q-learning with bidirectional action-dependency. In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 8536–8544, 2023.
- Coach-player multi-agent reinforcement learning for dynamic team composition. In International Conference on Machine Learning, pages 6860–6870. PMLR, 2021.
- Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30, 2017.
- Maven: Multi-agent variational exploration. Advances in neural information processing systems, 32, 2019.
- Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications. IEEE transactions on cybernetics, 50(9):3826–3839, 2020.
- Optimal and approximate q-value functions for decentralized pomdps. Journal of Artificial Intelligence Research, 32:289–353, 2008.
- Weighted qmix: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. Advances in neural information processing systems, 33:10199–10210, 2020.
- Monotonic value function factorisation for deep multi-agent reinforcement learning. The Journal of Machine Learning Research, 21(1):7234–7284, 2020.
- Gcs: graph-based coordination strategy for multi-agent reinforcement learning. arXiv preprint arXiv:2201.06257, 2022.
- Learning multi-agent action coordination via electing first-move agent. In Proceedings of the International Conference on Automated Planning and Scheduling, volume 32, pages 624–628, 2022.
- The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043, 2019.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
- Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015.
- High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International conference on machine learning, pages 5887–5896. PMLR, 2019.
- Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296, 2017.
- Multiagent cooperation and competition with deep reinforcement learning. PloS one, 12(4):e0172395, 2017.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Qplex: Duplex dueling multi-agent q-learning. arXiv preprint arXiv:2008.01062, 2020.
- Rode: Learning roles to decompose multi-agent tasks. arXiv preprint arXiv:2010.01523, 2020.
- Hierarchical attention master–slave for heterogeneous multi-agent reinforcement learning. Neural Networks, 162:359–368, 2023.
- Order matters: Agent-by-agent policy optimization. arXiv preprint arXiv:2302.06205, 2023.
- Multi-agent reinforcement learning is a sequence modeling problem. Advances in Neural Information Processing Systems, 35:16509–16521, 2022.
- Multi-agent deep reinforcement learning for urban traffic light control in vehicular networks. IEEE Transactions on Vehicular Technology, 69(8):8243–8256, 2020.
- Daydreamer: World models for physical robot learning. In Conference on Robot Learning, pages 2226–2240. PMLR, 2023.
- The surprising effectiveness of ppo in cooperative multi-agent games. Advances in Neural Information Processing Systems, 35:24611–24624, 2022.
- A leader-following paradigm based deep reinforcement learning method for multi-agent cooperation games. Neural Networks, 156:1–12, 2022.
- Chenxing Liu (1 paper)
- Guizhong Liu (22 papers)