Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Ensembling Prioritized Hybrid Policies for Multi-agent Pathfinding (2403.07559v2)

Published 12 Mar 2024 in cs.MA, cs.AI, cs.LG, and cs.RO

Abstract: Multi-Agent Reinforcement Learning (MARL) based Multi-Agent Path Finding (MAPF) has recently gained attention due to its efficiency and scalability. Several MARL-MAPF methods choose to use communication to enrich the information one agent can perceive. However, existing works still struggle in structured environments with high obstacle density and a high number of agents. To further improve the performance of the communication-based MARL-MAPF solvers, we propose a new method, Ensembling Prioritized Hybrid Policies (EPH). We first propose a selective communication block to gather richer information for better agent coordination within multi-agent environments and train the model with a Q learning-based algorithm. We further introduce three advanced inference strategies aimed at bolstering performance during the execution phase. First, we hybridize the neural policy with single-agent expert guidance for navigating conflict-free zones. Secondly, we propose Q value-based methods for prioritized resolution of conflicts as well as deadlock situations. Finally, we introduce a robust ensemble method that can efficiently collect the best out of multiple possible solutions. We empirically evaluate EPH in complex multi-agent environments and demonstrate competitive performance against state-of-the-art neural methods for MAPF. We open-source our code at https://github.com/ai4co/eph-mapf.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. R. Stern, N. Sturtevant, A. Felner, S. Koenig, H. Ma, T. Walker, J. Li, D. Atzmon, L. Cohen, T. Kumar et al., “Multi-agent pathfinding: Definitions, variants, and benchmarks,” in Proceedings of the International Symposium on Combinatorial Search, vol. 10, no. 1, 2019, pp. 151–158.
  2. P. R. Wurman, R. D’Andrea, and M. Mountz, “Coordinating hundreds of cooperative, autonomous vehicles in warehouses,” AI magazine, vol. 29, no. 1, pp. 9–9, 2008.
  3. R. Morris, C. S. Pasareanu, K. S. Luckow, W. Malik, H. Ma, T. S. Kumar, and S. Koenig, “Planning, scheduling and monitoring for airport surface operations.” in AAAI Workshop: Planning for Hybrid Systems, 2016, pp. 608–614.
  4. H. Ma, J. Yang, L. Cohen, T. Kumar, and S. Koenig, “Feasibility study: Moving non-homogeneous teams in congested video game environments,” in Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, vol. 13, no. 1, 2017, pp. 270–272.
  5. J. Yu and S. LaValle, “Structure and intractability of optimal multi-robot path planning on graphs,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 27, no. 1, 2013, pp. 1443–1449.
  6. J. Banfi, N. Basilico, and F. Amigoni, “Intractability of time-optimal multirobot path planning on 2d grid graphs with holes,” IEEE Robotics and Automation Letters, vol. 2, no. 4, pp. 1941–1947, 2017.
  7. G. Sharon, R. Stern, A. Felner, and N. R. Sturtevant, “Conflict-based search for optimal multi-agent pathfinding,” Artificial Intelligence, vol. 219, pp. 40–66, 2015.
  8. M. Barer, G. Sharon, R. Stern, and A. Felner, “Suboptimal variants of the conflict-based search algorithm for the multi-agent pathfinding problem,” in Proceedings of the International Symposium on Combinatorial Search, vol. 5, no. 1, 2014, pp. 19–27.
  9. J. Li, W. Ruml, and S. Koenig, “Eecbs: A bounded-suboptimal search for multi-agent path finding,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 14, 2021, pp. 12 353–12 362.
  10. J. Li, A. Tinka, S. Kiesel, J. W. Durham, T. S. Kumar, and S. Koenig, “Lifelong multi-agent path finding in large-scale warehouses,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 13, 2021, pp. 11 272–11 281.
  11. G. Sartoretti, J. Kerr, Y. Shi, G. Wagner, T. S. Kumar, S. Koenig, and H. Choset, “Primal: Pathfinding via reinforcement and imitation multi-agent learning,” IEEE Robotics and Automation Letters, vol. 4, no. 3, pp. 2378–2385, 2019.
  12. M. Damani, Z. Luo, E. Wenzel, and G. Sartoretti, “Primal _⁢2_2\_2_ 2: Pathfinding via reinforcement and imitation multi-agent learning-lifelong,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2666–2673, 2021.
  13. Z. Ma, Y. Luo, and J. Pan, “Learning selective communication for multi-agent path finding,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 1455–1462, 2021.
  14. Z. Ma, Y. Luo, and H. Ma, “Distributed heuristic multi-agent path finding with communication,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021, pp. 8699–8705.
  15. V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in International conference on machine learning.   PMLR, 2016, pp. 1928–1937.
  16. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, 2013.
  17. Y. Wang, B. Xiang, S. Huang, and G. Sartoretti, “SCRIMP: Scalable communication for reinforcement-and imitation-learning-based multi-agent pathfinding,” arXiv preprint arXiv:2303.00605, 2023.
  18. Q. Lin and H. Ma, “SACHA: Soft actor-critic with heuristic-based attention for partially observable multi-agent path finding,” IEEE Robotics and Automation Letters, 2023.
  19. G. Gange, D. Harabor, and P. J. Stuckey, “Lazy cbs: implicit conflict-based search using lazy clause generation,” in Proceedings of the international conference on automated planning and scheduling, vol. 29, 2019, pp. 155–162.
  20. J. Li, G. Gange, D. Harabor, P. J. Stuckey, H. Ma, and S. Koenig, “New techniques for pairwise symmetry breaking in multi-agent path finding,” in Proceedings of the International Conference on Automated Planning and Scheduling, vol. 30, 2020, pp. 193–201.
  21. J. Chung, J. Fayyad, Y. A. Younes, and H. Najjaran, “Learning to team-based navigation: A review of deep reinforcement learning techniques for multi-agent pathfinding,” arXiv preprint arXiv:2308.05893, 2023.
  22. Z. He, L. Dong, C. Sun, and J. Wang, “Asynchronous multithreading reinforcement-learning-based path planning and tracking for unmanned underwater vehicle,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 5, pp. 2757–2769, 2021.
  23. D. Wang, H. Deng, and Z. Pan, “Mrcdrl: Multi-robot coordination with deep reinforcement learning,” Neurocomputing, vol. 406, pp. 68–76, 2020.
  24. L. Chen, Y. Wang, Y. Mo, Z. Miao, H. Wang, M. Feng, and S. Wang, “Multiagent path finding using deep reinforcement learning coupled with hot supervision contrastive loss,” IEEE Transactions on Industrial Electronics, vol. 70, no. 7, pp. 7032–7040, 2022.
  25. W. Li, H. Chen, B. Jin, W. Tan, H. Zha, and X. Wang, “Multi-agent path finding with prioritized communication learning,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 10 695–10 701.
  26. L. Chen, Y. Wang, Z. Miao, Y. Mo, M. Feng, and Z. Zhou, “Multi-agent path finding using imitation-reinforcement learning with transformer,” in 2022 IEEE International Conference on Robotics and Biomimetics (ROBIO).   IEEE, 2022, pp. 445–450.
  27. J. Gao, Y. Li, X. Yang, and M. Tan, “Rde: A hybrid policy framework for multi-agent path finding problem,” arXiv preprint arXiv:2311.01728, 2023.
  28. H. Tang, F. Berto, Z. Ma, C. Hua, K. Ahn, and J. Park, “HiMAP: Learning heuristics-informed policies for large-scale multi-agent pathfinding,” in AAMAS, 2024.
  29. E. Parisotto, F. Song, J. Rae, R. Pascanu, C. Gulcehre, S. Jayakumar, M. Jaderberg, R. L. Kaufman, A. Clark, S. Noury et al., “Stabilizing transformers for reinforcement learning,” in International conference on machine learning.   PMLR, 2020, pp. 7487–7498.
  30. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  31. Z. Wang, T. Schaul, M. Hessel, H. Hasselt, M. Lanctot, and N. Freitas, “Dueling network architectures for deep reinforcement learning,” in International conference on machine learning.   PMLR, 2016, pp. 1995–2003.
  32. H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” in Proceedings of the AAAI conference on artificial intelligence, vol. 30, no. 1, 2016.
  33. H. Hasselt, “Double q-learning,” Advances in neural information processing systems, vol. 23, 2010.
  34. D. Silver, “Cooperative pathfinding,” in Proceedings of the aaai conference on artificial intelligence and interactive digital entertainment, vol. 1, no. 1, 2005, pp. 117–122.
  35. K. Okumura, M. Machida, X. Défago, and Y. Tamura, “Priority inheritance with backtracking for iterative multi-agent path finding,” Artificial Intelligence, vol. 310, p. 103752, 2022.
  36. H. Ma, D. Harabor, P. J. Stuckey, J. Li, and S. Koenig, “Searching with consistent prioritization for multi-agent path finding,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 7643–7650.
  37. T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” arXiv preprint arXiv:1511.05952, 2015.
  38. Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” in Proceedings of the 26th annual international conference on machine learning, 2009, pp. 41–48.
  39. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  40. C. Ferner, G. Wagner, and H. Choset, “Odrm* optimal multirobot path planning in low dimensional search spaces,” in 2013 IEEE international conference on robotics and automation.   IEEE, 2013, pp. 3854–3859.
  41. R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Machine learning, vol. 8, pp. 229–256, 1992.
  42. F. Berto, C. Hua, J. Park, M. Kim, H. Kim, J. Son, H. Kim, J. Kim, and J. Park, “RL4CO: a unified reinforcement learning for combinatorial optimization library,” in NeurIPS 2023 Workshop: New Frontiers in Graph Learning, 2023.
  43. A. Hottung and K. Tierney, “Neural large neighborhood search for the capacitated vehicle routing problem,” arXiv preprint arXiv:1911.09539, 2019.
  44. W. Kool, L. Bliek, D. Numeroso, Y. Zhang, T. Catshoek, K. Tierney, T. Vidal, and J. Gromicho, “The euro meets neurips 2022 vehicle routing competition,” in NeurIPS 2022 Competition Track.   PMLR, 2022, pp. 35–49.
  45. H. Ye, J. Wang, H. Liang, Z. Cao, Y. Li, and F. Li, “Glop: Learning global partition and local construction for solving large-scale routing problems in real-time,” AAAI 2024, 2024.
  46. H. Ye, J. Wang, Z. Cao, H. Liang, and Y. Li, “Deepaco: Neural-enhanced ant systems for combinatorial optimization,” Advances in Neural Information Processing Systems, vol. 36, 2024.
  47. F. Liu, X. Tong, M. Yuan, X. Lin, F. Luo, Z. Wang, Z. Lu, and Q. Zhang, “An example of evolutionary computation+ large language model beating human: Design of efficient guided local search,” arXiv preprint arXiv:2401.02051, 2024.
  48. H. Ye, J. Wang, Z. Cao, and G. Song, “Reevo: Large language models as hyper-heuristics with reflective evolution,” arXiv preprint arXiv:2402.01145, 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Huijie Tang (3 papers)
  2. Federico Berto (19 papers)
  3. Jinkyoo Park (75 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.