Multiagent Reinforcement Learning for Autonomous Routing and Pickup Problem with Adaptation to Variable Demand (2211.14983v2)
Abstract: We derive a learning framework to generate routing/pickup policies for a fleet of autonomous vehicles tasked with servicing stochastically appearing requests on a city map. We focus on policies that 1) give rise to coordination amongst the vehicles, thereby reducing wait times for servicing requests, 2) are non-myopic, and consider a-priori potential future requests, 3) can adapt to changes in the underlying demand distribution. Specifically, we are interested in policies that are adaptive to fluctuations of actual demand conditions in urban environments, such as on-peak vs. off-peak hours. We achieve this through a combination of (i) an online play algorithm that improves the performance of an offline-trained policy, and (ii) an offline approximation scheme that allows for adapting to changes in the underlying demand model. In particular, we achieve adaptivity of our learned policy to different demand distributions by quantifying a region of validity using the q-valid radius of a Wasserstein Ambiguity Set. We propose a mechanism for switching the originally trained offline approximation when the current demand is outside the original validity region. In this case, we propose to use an offline architecture, trained on a historical demand model that is closer to the current demand in terms of Wasserstein distance. We learn routing and pickup policies over real taxicab requests in San Francisco with high variability between on-peak and off-peak hours, demonstrating the ability of our method to adapt to real fluctuation in demand distributions. Our numerical results demonstrate that our method outperforms alternative rollout-based reinforcement learning schemes, as well as other classical methods from operations research.
- G. Berbeglia, J.-F. Cordeau, and G. Laporte, “Dynamic pickup and delivery problems,” European Journal of Operational Research, vol. 202, no. 1, pp. 8–15, 2010. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0377221709002999
- S. Choudhury, K. Solovey, M. J. Kochenderfer, and M. Pavone, “Efficient large-scale multi-drone delivery using transit networks,” Journal of Artificial Intelligence Research, vol. 70, pp. 757–788, 2021.
- B. Arbanas, A. Ivanovic, M. Car, T. Haus, M. Orsag, T. Petrovic, and S. Bogdan, “Aerial-ground robotic system for autonomous delivery tasks,” in 2016 IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 5463–5468.
- S. Zhang, C. Markos, and J. J. Q. Yu, “Autonomous vehicle intelligent system: Joint ride-sharing and parcel delivery strategy,” IEEE Transactions on Intelligent Transportation Systems, pp. 1–12, 2022.
- W. Emanuelsson, A. P. Riveiros, Y. Li, K. H. Johansson, and J. Mårtensson, “Multiagent rollout with reshuffling for warehouse robots path planning,” 2022. [Online]. Available: https://arxiv.org/abs/2211.08201
- M. Tsao, D. Milojevic, C. Ruch, M. Salazar, E. Frazzoli, and M. Pavone, “Model predictive control of ride-sharing autonomous mobility-on-demand systems,” in 2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 6665–6671.
- G. Guo and Y. Xu, “A deep reinforcement learning approach to ride-sharing vehicle dispatching in autonomous mobility-on-demand systems,” IEEE Intelligent Transportation Systems Magazine, vol. 14, no. 1, pp. 128–140, 2022.
- B. Li, N. Ammar, P. Tiwari, and H. Peng, “Decentralized ride-sharing of shared autonomous vehicles using graph neural network-based reinforcement learning,” in 2022 International Conference on Robotics and Automation (ICRA), 2022, pp. 912–918.
- J. Alonso-Mora, A. Wallar, and D. Rus, “Predictive routing for autonomous mobility-on-demand systems with ride-sharing,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 3583–3590.
- M. Gueriau, F. Cugurullo, R. A. Acheampong, and I. Dusparic, “Shared autonomous mobility on demand: A learning-based approach and its performance in the presence of traffic congestion,” IEEE Intelligent Transportation Systems Magazine, vol. 12, no. 4, pp. 208–218, 2020.
- D. Bertsekas, “A distributed algorithm for the assignment problem,” Lab. for Information and Decision Systems Report, 05 1979.
- R. M. Karp, U. V. Vazirani, and V. V. Vazirani, “An optimal algorithm for on-line bipartite matching,” in Proceedings of the Twenty-Second Annual ACM Symposium on Theory of Computing, ser. STOC ’90. New York, NY, USA: Association for Computing Machinery, 1990, p. 352–358. [Online]. Available: https://doi.org/10.1145/100216.100262
- R. Duan and S. Pettie, “Linear-time approximation for maximum weight matching,” J. ACM, vol. 61, no. 1, 2014. [Online]. Available: https://doi.org/10.1145/2529989
- D. Bertsimas, P. Jaillet, and S. Martin, “Online vehicle routing: The edge of optimization in large-scale applications,” Oper. Res., vol. 67, pp. 143–162, 2019.
- G. A. Croes, “A method for solving traveling-salesman problems,” Operations Research, vol. 6, no. 6, pp. 791–812, 1958. [Online]. Available: http://www.jstor.org/stable/167074
- M. Lowalekar, P. Varakantham, and P. Jaillet, “Online spatio-temporal matching in stochastic and dynamic domains,” Artificial Intelligence, vol. 261, pp. 71–112, 2018. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0004370218302030
- M. W. Ulmer, J. C. Goodson, D. C. Mattfeld, and M. Hennig, “Offline–online approximate dynamic programming for dynamic vehicle routing with stochastic requests,” Transportation Science, vol. 53, no. 1, pp. 185–202, 2019. [Online]. Available: https://doi.org/10.1287/trsc.2017.0767
- N. Parvez Farazi, B. Zou, T. Ahamed, and L. Barua, “Deep reinforcement learning in transportation research: A review,” Transportation Research Interdisciplinary Perspectives, vol. 11, p. 100425, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2590198221001317
- T. Ahamed, B. Zou, N. P. Farazi, and T. Tulabandhula, “Deep reinforcement learning for crowdsourced urban delivery,” Transportation Research Part B: Methodological, vol. 152, pp. 227–257, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0191261521001636
- C. Wu, K. Shankari, E. Kamar, R. Katz, D. Culler, C. Papadimitriou, E. Horvitz, and A. Bayen, “Optimizing the diamond lane: A more tractable carpool problem and algorithms,” in 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), 2016, pp. 1389–1396.
- D. Silver and J. Veness, “Monte-Carlo Planning in Large POMDPs,” in Proc. 23rd International Conf. on NeurIPS, Red Hook, NY, USA, 2010, pp. 2164–2172.
- A. Somani, N. Ye, D. Hsu, and W. S. Lee, “Despot: Online pomdp planning with regularization,” in Advances in Neural Information Processing Systems, C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, Eds., vol. 26. Curran Associates, Inc., 2013. [Online]. Available: https://proceedings.neurips.cc/paper/2013/file/c2aee86157b4a40b78132f1e71a9e6f1-Paper.pdf
- R. W. Bent and P. Van Hentenryck, “Scenario-based planning for partially dynamic vehicle routing with stochastic customers,” Operations Research, vol. 52, no. 6, pp. 977–987, 2004. [Online]. Available: https://doi.org/10.1287/opre.1040.0124
- F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neural network model,” IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61–80, 2009.
- P. M. Esfahani and D. Kuhn, “Data-driven distributionally robust optimization using the wasserstein metric: Performance guarantees and tractable reformulations,” 2015. [Online]. Available: https://arxiv.org/abs/1505.05116
- M. Piorkowski, N. Sarafijanovic-Djukic, and M. Grossglauser, “CRAWDAD dataset epfl/mobility (v. 2009-02-24),” Downloaded from https://crawdad.org/epfl/mobility/20090224, Feb. 2009.
- D. Bertsekas, “Multiagent value iteration algorithms in dynamic programming and reinforcement learning,” Results in Control and Optimization, vol. 1, p. 100003, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2666720720300035
- ——, “Multiagent reinforcement learning: Rollout and policy iteration,” IEEE/CAA Journal of Automatica Sinica, vol. 8, no. 2, pp. 249–272, 2021.
- R. Ji and M. Lejeune, “Data-driven optimization of reward-risk ratio measures,” INFORMS Journal on Computing, vol. 33, 11 2020.
- S. Bhattacharya, S. Kailas, S. Badyal, S. Gil, and D. Bertsekas, “Multiagent rollout and policy iteration for pomdp with application to multi-robot repair problems,” in Proceedings of the 2020 Conference on Robot Learning, ser. Proceedings of Machine Learning Research, J. Kober, F. Ramos, and C. Tomlin, Eds., vol. 155. PMLR, 16–18 Nov 2021, pp. 1814–1828. [Online]. Available: https://proceedings.mlr.press/v155/bhattacharya21a.html
- S. Bhattacharya, S. Badyal, T. Wheeler, S. Gil, and D. Bertsekas, “Reinforcement learning for pomdp: Partitioned rollout and policy iteration with application to autonomous sequential repair problems,” IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 3967–3974, 2020.
- T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” CoRR, vol. abs/1609.02907, 2016. [Online]. Available: http://arxiv.org/abs/1609.02907
- M. Azram, F. Elfaki, and J. Daoud, “Classification of atoms,” Australian Journal of Basic and Applied Sciences, vol. 5, pp. 5–8, 05 2011.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014.
- B. B. Werger and M. J. Matarić, “Broadcast of local eligibility for multi-target observation,” in Distributed Autonomous Robotic Systems 4. Springer, 2000, pp. 347–356.
- D. Bertsekas and D. Castanon, “Rollout algorithms for stochastic scheduling problems,” in Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171), vol. 2, 1998, pp. 2143–2148 vol.2.
- ——, “Constrained multiagent rollout and multidimensional assignment with the auction algorithm,” 2020. [Online]. Available: https://arxiv.org/abs/2002.07407