TVDO: Tchebycheff Value-Decomposition Optimization for Multi-Agent Reinforcement Learning (2306.13979v1)
Abstract: In cooperative multi-agent reinforcement learning (MARL) settings, the centralized training with decentralized execution (CTDE) becomes customary recently due to the physical demand. However, the most dilemma is the inconsistency of jointly-trained policies and individually-optimized actions. In this work, we propose a novel value-based multi-objective learning approach, named Tchebycheff value decomposition optimization (TVDO), to overcome the above dilemma. In particular, a nonlinear Tchebycheff aggregation method is designed to transform the MARL task into multi-objective optimal counterpart by tightly constraining the upper bound of individual action-value bias. We theoretically prove that TVDO well satisfies the necessary and sufficient condition of individual global max (IGM) with no extra limitations, which exactly guarantees the consistency between the global and individual optimal action-value function. Empirically, in the climb and penalty game, we verify that TVDO represents precisely from global to individual value factorization with a guarantee of the policy consistency. Furthermore, we also evaluate TVDO in the challenging scenarios of StarCraft II micromanagement tasks, and extensive experiments demonstrate that TVDO achieves more competitive performances than several state-of-the-art MARL methods.
- A deeper understanding of state-based critics in multi-agent reinforcement learning. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 9396–9404, 2022.
- An efficient transfer learning framework for multiagent reinforcement learning. Advances in Neural Information Processing Systems, 34:17037–17048, 2021.
- Multi-agent deep reinforcement learning for urban traffic light control in vehicular networks. IEEE Transactions on Vehicular Technology, 69(8):8243–8256, 2020.
- A survey and critique of multiagent deep reinforcement learning. Autonomous Agents and Multi-Agent Systems, 33(6):750–797, 2019.
- Multi-agent robotic systems in collaborative robotics. In Interactive Collaborative Robotics: Third International Conference, ICR 2018, Leipzig, Germany, September 18–22, 2018, Proceedings 3, pages 270–279. Springer, 2018.
- Google research football: A novel reinforcement learning environment, 2020.
- Guided deep reinforcement learning for swarm systems, 2017.
- Dota 2 with large scale deep reinforcement learning, 2019.
- Distributed cooperative spectrum sharing in uav networks using multi-agent reinforcement learning. In 2019 16th IEEE Annual Consumer Communications & Networking Conference (CCNC), pages 1–6. IEEE, 2019.
- Multiagent cooperation and competition with deep reinforcement learning. PloS one, 12(4):e0172395, 2017.
- Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190:82–94, 2016.
- Value-decomposition networks for cooperative multi-agent learning based on team reward. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pages 2085–2087, 2018.
- Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International conference on machine learning, pages 4295–4304. PMLR, 2018.
- Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International conference on machine learning, pages 5887–5896. PMLR, 2019.
- Weighted qmix: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. Advances in neural information processing systems, 33:10199–10210, 2020.
- Qatten: A general framework for cooperative multiagent reinforcement learning. arXiv preprint arXiv:2002.03939, 2020.
- Qplex: Duplex dueling multi-agent q-learning. In ICLR, 2021.
- Dfac framework: Factorizing the value function via quantile mixture for multi-agent distributional q-learning. In International Conference on Machine Learning, pages 9945–9954. PMLR, 2021.
- Locality matters: A scalable value decomposition approach for cooperative multi-agent reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 9278–9285, 2022.
- The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the Fifteenth National/Tenth Conference on Artificial Intelligence/Innovative Applications of Artificial Intelligence, AAAI ’98/IAAI ’98, page 746–752, USA, 1998. American Association for Artificial Intelligence.
- Biasing coevolutionary search for optimal multiagent behaviors. IEEE Transactions on Evolutionary Computation, 10(6):629–645, 2006.
- The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043, 2019.
- Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30, 2017.
- Counterfactual multi-agent policy gradients. Proceedings of the AAAI conference on artificial intelligence, 32(1), 2018.
- Deep multi-agent reinforcement learning for decentralized continuous cooperative control. arXiv preprint arXiv:2003.06709, 2020.
- Off-policy multi-agent decomposed policy gradients. arXiv preprint arXiv:2007.12322, 2020.
- A policy gradient algorithm for learning to learn in multiagent reinforcement learning. In International Conference on Machine Learning, pages 5541–5550. PMLR, 2021.
- Fop: Factorizing optimal joint policy of maximum-entropy multi-agent reinforcement learning. In International Conference on Machine Learning, pages 12491–12500. PMLR, 2021.
- A concise introduction to decentralized POMDPs. Springer, 2016.
- Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization, 26(6):369–395, 2004.
- On constrained boolean pareto optimization. In Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
- Moea/d: A multiobjective evolutionary algorithm based on decomposition. IEEE Transactions on Evolutionary Computation, 11(6):712–731, 2007.
- A. Jaszkiewicz. On the performance of multiple-objective genetic local search on the 0/1 knapsack problem - a comparative experiment. IEEE Transactions on Evolutionary Computation, 6(4):402–412, 2002.
- Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
- Cooperative exploration for multi-agent deep reinforcement learning. In International Conference on Machine Learning, pages 6826–6836. PMLR, 2021.
- Xiaoliang Hu (1 paper)
- Pengcheng Guo (55 papers)
- Chuanwei Zhou (4 papers)
- Tong Zhang (569 papers)
- Zhen Cui (56 papers)