MARL-LNS: Cooperative Multi-agent Reinforcement Learning via Large Neighborhoods Search (2404.03101v1)
Abstract: Cooperative multi-agent reinforcement learning (MARL) has been an increasingly important research topic in the last half-decade because of its great potential for real-world applications. Because of the curse of dimensionality, the popular "centralized training decentralized execution" framework requires a long time in training, yet still cannot converge efficiently. In this paper, we propose a general training framework, MARL-LNS, to algorithmically address these issues by training on alternating subsets of agents using existing deep MARL algorithms as low-level trainers, while not involving any additional parameters to be trained. Based on this framework, we provide three algorithm variants based on the framework: random large neighborhood search (RLNS), batch large neighborhood search (BLNS), and adaptive large neighborhood search (ALNS), which alternate the subsets of agents differently. We test our algorithms on both the StarCraft Multi-Agent Challenge and Google Research Football, showing that our algorithms can automatically reduce at least 10% of training time while reaching the same final skill level as the original algorithm.
- On the convergence of block coordinate descent type methods. SIAM journal on Optimization, 23(4):2037–2060, 2013.
- Superhuman ai for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374):418–424, 2018.
- E-MAPP: efficient multi-agent reinforcement learning with parallel program guidance. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/4f2accafe6fa355624f3ee42207cc7b8-Abstract-Conference.html.
- Temporal induced self-play for stochastic bayesian games. arXiv preprint arXiv:2108.09444, 2021.
- Scaling multi-agent reinforcement learning with selective parameter sharing. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pp. 1989–1998. PMLR, 2021. URL http://proceedings.mlr.press/v139/christianos21a.html.
- On learning algorithms for nash equilibria. In SAGT, pp. 114–125. Springer, 2010.
- Counterfactual multi-agent policy gradients. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
- Towards efficient multi-agent learning systems. In Architecture and System Support for Transformer Models (ASSYST@ ISCA 2023), 2023.
- Human-level performance in no-press diplomacy via equilibrium search. In International Conference on Learning Representations, 2020.
- Discriminative experience replay for efficient multi-agent reinforcement learning. CoRR, abs/2301.10574, 2023. doi: 10.48550/arXiv.2301.10574. URL https://doi.org/10.48550/arXiv.2301.10574.
- Anytime multi-agent path finding via machine learning-guided large neighborhood search. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 9368–9376, 2022.
- Guided deep reinforcement learning for swarm systems. arXiv preprint arXiv:1709.06011, 2017.
- Randomized entity-wise factorization for multi-agent reinforcement learning. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pp. 4596–4606. PMLR, 2021. URL http://proceedings.mlr.press/v139/iqbal21a.html.
- Google research football: A novel reinforcement learning environment. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 4501–4510, 2020.
- Flatland competition 2020: Mapf and marl for efficient train coordination on a grid world. In NeurIPS 2020 Competition and Demonstration Track, pp. 275–301. PMLR, 2021.
- Anytime multi-agent path finding via large neighborhood search. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2021.
- Mapf-lns2: fast repairing for multi-agent path finding via large neighborhood search. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 10256–10265, 2022.
- Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30, 2017.
- On the complexity analysis of randomized block-coordinate descent methods. Mathematical Programming, 152:615–642, 2015.
- Hanbaek Lyu. Convergence and complexity of block coordinate descent with diminishing radius for nonconvex optimization. arXiv preprint arXiv:2012.03503, 2020.
- Alternating criteria search: a parallel large neighborhood search algorithm for mixed integer programs. Computational Optimization and Applications, 69(1):1–24, 2018.
- Taming decentralized pomdps: Towards efficient policy computation for multiagent settings. In Georg Gottlob and Toby Walsh (eds.), IJCAI-03, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, August 9-15, 2003, pp. 705–711. Morgan Kaufmann, 2003. URL http://ijcai.org/Proceedings/03/Papers/103.pdf.
- Vast: Value function factorization with variable agent sub-teams. In Advances in Neural Information Processing Systems (NeurIPS), volume 34, pp. 24018–24032. Curran Associates, Inc., 2021.
- QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In Jennifer G. Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pp. 4292–4301. PMLR, 2018. URL http://proceedings.mlr.press/v80/rashid18a.html.
- An adaptive large neighborhood search heuristic for the pickup and delivery problem with time windows. Transportation science, 40(4):455–472, 2006.
- Jaxmarl: Multi-agent rl environments in jax. arXiv preprint arXiv:2311.10090, 2023.
- The StarCraft Multi-Agent Challenge. CoRR, abs/1902.04043, 2019.
- High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015.
- Paul Shaw. Using constraint programming and local search methods to solve vehicle routing problems. In Principles and Practice of Constraint Programming—CP98: 4th International Conference, CP98 Pisa, Italy, October 26–30, 1998 Proceedings 4, pp. 417–431. Springer, 1998.
- Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International conference on machine learning, pp. 5887–5896. PMLR, 2019.
- A general large neighborhood search framework for solving integer programs. arXiv, 2020.
- Learning a large neighborhood search algorithm for mixed integer programs. CoRR, abs/2107.10201, 2021. URL https://arxiv.org/abs/2107.10201.
- Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296, 2017.
- Paul Tseng. Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of optimization theory and applications, 109:475–494, 2001.
- Equilibrium refinement in security games with arbitrary scheduling constraints. In Elisabeth André, Sven Koenig, Mehdi Dastani, and Gita Sukthankar (eds.), Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2018, Stockholm, Sweden, July 10-15, 2018, pp. 919–927. International Foundation for Autonomous Agents and Multiagent Systems Richland, SC, USA / ACM, 2018. URL http://dl.acm.org/citation.cfm?id=3237836.
- Coordinated proximal policy optimization. Advances in Neural Information Processing Systems, 34:26437–26448, 2021.
- The surprising effectiveness of ppo in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955, 2021.
- Asynchronous multi-agent reinforcement learning for efficient real-time multi-robot cooperative exploration. In Noa Agmon, Bo An, Alessandro Ricci, and William Yeoh (eds.), Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023, London, United Kingdom, 29 May 2023 - 2 June 2023, pp. 1107–1115. ACM, 2023. doi: 10.5555/3545946.3598752. URL https://dl.acm.org/doi/10.5555/3545946.3598752.
- Neighborhood cooperative multiagent reinforcement learning for adaptive traffic signal control in epidemic regions. IEEE Transactions on Intelligent Transportation Systems, 23(12):25157–25168, 2022.
- Smarts: Scalable multi-agent reinforcement learning training school for autonomous driving. arXiv preprint arXiv:2010.09776, 2020.
- Weizhe Chen (20 papers)
- Sven Koenig (61 papers)
- Bistra Dilkina (49 papers)