HiMAP: Learning Heuristics-Informed Policies for Large-Scale Multi-Agent Pathfinding (2402.15546v1)
Abstract: Large-scale multi-agent pathfinding (MAPF) presents significant challenges in several areas. As systems grow in complexity with a multitude of autonomous agents operating simultaneously, efficient and collision-free coordination becomes paramount. Traditional algorithms often fall short in scalability, especially in intricate scenarios. Reinforcement Learning (RL) has shown potential to address the intricacies of MAPF; however, it has also been shown to struggle with scalability, demanding intricate implementation, lengthy training, and often exhibiting unstable convergence, limiting its practical application. In this paper, we introduce Heuristics-Informed Multi-Agent Pathfinding (HiMAP), a novel scalable approach that employs imitation learning with heuristic guidance in a decentralized manner. We train on small-scale instances using a heuristic policy as a teacher that maps each single agent observation information to an action probability distribution. During pathfinding, we adopt several inference techniques to improve performance. With a simple training scheme and implementation, HiMAP demonstrates competitive results in terms of success rate and scalability in the field of imitation-learning-only MAPF, showing the potential of imitation-learning-only MAPF equipped with inference techniques.
- Natalie Abreu. 2022. Efficient Deep Learning for Multi Agent Pathfinding. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 13122–13123.
- Learning to Schedule in Multi-Agent Pathfinding. (2023).
- Suboptimal variants of the conflict-based search algorithm for the multi-agent pathfinding problem. In Proceedings of the International Symposium on Combinatorial Search, Vol. 5. 19–27.
- Multiagent Path Finding Using Deep Reinforcement Learning Coupled With Hot Supervision Contrastive Loss. IEEE Transactions on Industrial Electronics 70, 7 (2022), 7032–7040.
- Learning to Team-Based Navigation: A Review of Deep Reinforcement Learning Techniques for Multi-Agent Pathfinding. arXiv preprint arXiv:2308.05893 (2023).
- PRIMAL _2_2\_2_ 2: Pathfinding via reinforcement and imitation multi-agent learning-lifelong. IEEE Robotics and Automation Letters 6, 2 (2021), 2666–2673.
- RDE: A Hybrid Policy Framework for Multi-Agent Path Finding Problem. arXiv preprint arXiv:2311.01728 (2023).
- Eecbs: A bounded-suboptimal search for multi-agent path finding. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 12353–12362.
- Qiushi Lin and Hang Ma. 2023. SACHA: Soft Actor-Critic with Heuristic-Based Attention for Partially Observable Multi-Agent Path Finding. IEEE Robotics and Automation Letters (2023).
- Feasibility study: Moving non-homogeneous teams in congested video game environments. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Vol. 13. 270–272.
- Distributed heuristic multi-agent path finding with communication. In 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 8699–8705.
- Learning selective communication for multi-agent path finding. IEEE Robotics and Automation Letters 7, 2 (2021), 1455–1462.
- Planning, Scheduling and Monitoring for Airport Surface Operations.. In AAAI Workshop: Planning for Hybrid Systems. 608–614.
- MAPFAST: A deep algorithm selector for multi agent path finding using shortest path embeddings. arXiv preprint arXiv:2102.12461 (2021).
- Primal: Pathfinding via reinforcement and imitation multi-agent learning. IEEE Robotics and Automation Letters 4, 3 (2019), 2378–2385.
- Conflict-based search for optimal multi-agent pathfinding. Artificial Intelligence 219 (2015), 40–66.
- Multi-agent pathfinding: Definitions, variants, and benchmarks. In Proceedings of the International Symposium on Combinatorial Search, Vol. 10. 151–158.
- Efficient SAT approach to multi-agent path finding under the sum of costs objective. In Proceedings of the twenty-second european conference on artificial intelligence. 810–818.
- SCRIMP: Scalable Communication for Reinforcement-and Imitation-Learning-Based Multi-Agent Pathfinding. arXiv preprint arXiv:2303.00605 (2023).
- Coordinating hundreds of cooperative, autonomous vehicles in warehouses. AI magazine 29, 1 (2008), 9–9.
- Jingjin Yu and Steven M LaValle. 2013. Planning optimal paths for multiple robots on graphs. In 2013 IEEE International Conference on Robotics and Automation. IEEE, 3612–3617.