Adaptive Anytime Multi-Agent Path Finding Using Bandit-Based Large Neighborhood Search (2312.16767v2)
Abstract: Anytime multi-agent path finding (MAPF) is a promising approach to scalable path optimization in large-scale multi-agent systems. State-of-the-art anytime MAPF is based on Large Neighborhood Search (LNS), where a fast initial solution is iteratively optimized by destroying and repairing a fixed number of parts, i.e., the neighborhood, of the solution, using randomized destroy heuristics and prioritized planning. Despite their recent success in various MAPF instances, current LNS-based approaches lack exploration and flexibility due to greedy optimization with a fixed neighborhood size which can lead to low quality solutions in general. So far, these limitations have been addressed with extensive prior effort in tuning or offline machine learning beyond actual planning. In this paper, we focus on online learning in LNS and propose Bandit-based Adaptive LArge Neighborhood search Combined with Exploration (BALANCE). BALANCE uses a bi-level multi-armed bandit scheme to adapt the selection of destroy heuristics and neighborhood sizes on the fly during search. We evaluate BALANCE on multiple maps from the MAPF benchmark set and empirically demonstrate cost improvements of at least 50% compared to state-of-the-art anytime MAPF in large-scale scenarios. We find that Thompson Sampling performs particularly well compared to alternative multi-armed bandit algorithms.
- Finite-Time Analysis of the Multiarmed Bandit Problem. Machine learning, 47(2-3): 235–256.
- Agent57: Outperforming the Atari Human Benchmark. In International conference on machine learning, 507–517. PMLR.
- Bayesian Mixture Modelling and Inference based Thompson Sampling in Monte-Carlo Tree Search. In Advances in Neural Information Processing Systems, 1646–1654.
- Thompson Sampling based Monte-Carlo Planning in POMDPs. In Proceedings of the Twenty-Fourth International Conferenc on International Conference on Automated Planning and Scheduling, 29–37. AAAI Press.
- An Empirical Evaluation of Thompson Sampling. In Advances in neural information processing systems, 2249–2257.
- A Reinforcement Learning Based Variable Neighborhood Search Algorithm for Open Periodic Vehicle Routing Problem with Time Windows.
- A Multi-Arm Bandit Neighbourhood Search for Routing and Scheduling Problems.
- Online Learning for Scheduling MIP Heuristics. In International Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research, 114–123. Springer.
- Anytime Focal Search with Applications. In IJCAI, 1434–1441.
- On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems. arXiv preprint arXiv:0805.3415.
- Goldberg, D. E. 1988. Genetic Algorithms in Search Optimization and Machine Learning.
- Hendel, G. 2022. Adaptive Large Neighborhood Search for Mixed Integer Programming. Mathematical Programming Computation, 1–37.
- Learning Node-Selection Strategies in Bounded Suboptimal Conflict-Based Search for Multi-Agent Path Finding. In International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS).
- Anytime Multi-Agent Path Finding via Machine Learning-Guided Large Neighborhood Search. In Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI), 9368–9376.
- Algorithm Selection for Optimal Multi-Agent Pathfinding. In Proceedings of the International Conference on Automated Planning and Scheduling, volume 30, 161–165.
- Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis. In International Conference on Algorithmic Learning Theory, 199–213. Springer.
- Bandit based Monte-Carlo Planning. In ECML, volume 6, 282–293. Springer.
- Exact Anytime Multi-Agent Path Finding Using Branch-and-Cut-and-Price and Large Neighborhood Search. In Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS).
- Anytime Multi-Agent Path Finding via Large Neighborhood Search. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 4127–4135.
- MAPF-LNS2: Fast Repairing for Multi-Agent Path Finding via Large Neighborhood Search. Proceedings of the AAAI Conference on Artificial Intelligence, 36(9): 10256–10265.
- A Survey of Adaptive Large Neighborhood Search Algorithms and Applications. Computers & Operations Research, 146: 105903.
- Memory Bounded Open-Loop Planning in Large POMDPs using Thompson Sampling. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01): 7941–7948.
- Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop Planning. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI-19, 5607–5613. International Joint Conferences on Artificial Intelligence Organization.
- Finding a Shortest Solution for the NxN Extension of the 15-Puzzle is Intractable. In Proceedings of the Fifth AAAI National Conference on Artificial Intelligence, AAAI’86, 168–172. AAAI Press.
- An Adaptive Large Neighborhood Search Heuristic for the Pickup and Delivery Problem with Time Windows. Transportation science, 40(4): 455–472.
- Rothberg, E. 2007. An Evolutionary Algorithm for Polishing Mixed Integer Programming Solutions. INFORMS Journal on Computing, 19(4): 534–541.
- PRIMAL: Pathfinding via Reinforcement and Imitation Multi-Agent Learning. IEEE Robotics and Automation Letters, 4(3): 2378–2385.
- Adapting Behaviour for Learning Progress. arXiv preprint arXiv:1912.06910.
- Conflict-Based Search For Optimal Multi-Agent Path Finding. Proceedings of the AAAI Conference on Artificial Intelligence, 26(1): 563–569.
- Silver, D. 2005. Cooperative Pathfinding. Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 1(1): 117–122.
- Monte-Carlo Planning in Large POMDPs. In Advances in neural information processing systems, 2164–2172.
- Multi-Agent Pathfinding: Definitions, Variants, and Benchmarks. In Proceedings of the International Symposium on Combinatorial Search, volume 10, 151–158.
- Monte Carlo Tree Search: A Review of Recent Modifications and Applications. Artificial Intelligence Review, 56(3): 2497–2562.
- Thompson, W. R. 1933. On the Likelihood that One Unknown Probability exceeds Another in View of the Evidence of Two Samples. Biometrika, 25(3/4): 285–294.
- Structure and Intractability of Optimal Multi-Robot Path Planning on Graphs. Proceedings of the AAAI Conference on Artificial Intelligence, 27(1): 1443–1449.