Multi-Agent Inverse Reinforcement Learning in Real World Unstructured Pedestrian Crowds (2405.16439v3)
Abstract: Social robot navigation in crowded public spaces such as university campuses, restaurants, grocery stores, and hospitals, is an increasingly important area of research. One of the core strategies for achieving this goal is to understand humans' intent--underlying psychological factors that govern their motion--by learning their reward functions, typically via inverse reinforcement learning (IRL). Despite significant progress in IRL, learning reward functions of multiple agents simultaneously in dense unstructured pedestrian crowds has remained intractable due to the nature of the tightly coupled social interactions that occur in these scenarios \textit{e.g.} passing, intersections, swerving, weaving, etc. In this paper, we present a new multi-agent maximum entropy inverse reinforcement learning algorithm for real world unstructured pedestrian crowds. Key to our approach is a simple, but effective, mathematical trick which we name the so-called tractability-rationality trade-off trick that achieves tractability at the cost of a slight reduction in accuracy. We compare our approach to the classical single-agent MaxEnt IRL as well as state-of-the-art trajectory prediction methods on several datasets including the ETH, UCY, SCAND, JRDB, and a new dataset, called Speedway, collected at a busy intersection on a University campus focusing on dense, complex agent interactions. Our key findings show that, on the dense Speedway dataset, our approach ranks 1st among top 7 baselines with >2X improvement over single-agent IRL, and is competitive with state-of-the-art large transformer-based encoder-decoder models on sparser datasets such as ETH/UCY (ranks 3rd among top 7 baselines).
- R. Chandra, V. Zinage, E. Bakolas, P. Stone, and J. Biswas, “Deadlock-free, safe, and decentralized multi-robot navigation in social mini-games via discrete-time control barrier functions,” 2024.
- R. Chandra, M. Wang, M. Schwager, and D. Manocha, “Game-theoretic planning for autonomous driving among risk-aware human drivers,” in 2022 International Conference on Robotics and Automation (ICRA), pp. 2876–2883, 2022.
- R. Chandra and D. Manocha, “Gameplan: Game-theoretic multi-agent planning with human drivers at intersections, roundabouts, and merging,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2676–2683, 2022.
- N. Suriyarachchi, R. Chandra, J. S. Baras, and D. Manocha, “Gameopt: Optimal real-time multi-agent planning and control at dynamic intersections,” in 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), pp. 2599–2606, IEEE doi - 10.1109/ITSC55140.2022.9921968, 2022.
- A. Mavrogiannis, R. Chandra, and D. Manocha, “B-gap: Behavior-rich simulation and navigation for autonomous driving,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4718–4725, 2022.
- H. Karnan, A. Nair, X. Xiao, G. Warnell, S. Pirk, A. Toshev, J. Hart, J. Biswas, and P. Stone, “Socially compliant navigation dataset (scand): A large-scale dataset of demonstrations for social navigation,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 11807–11814, 2022.
- A. H. Raj, Z. Hu, H. Karnan, R. Chandra, A. Payandeh, L. Mao, P. Stone, J. Biswas, and X. Xiao, “Targeted learning: A hybrid approach to social robot navigation,” 2023.
- Z. Sprague, R. Chandra, J. Holtz, and J. Biswas, “Socialgym 2.0: Simulator for multi-agent social robot navigation in shared human spaces,” arXiv preprint arXiv:2303.05584, 2023.
- R. Chandra, R. Maligi, A. Anantula, and J. Biswas, “Socialmapf: Optimal and efficient multi-agent path finding with strategic agents for social navigation,” IEEE Robotics and Automation Letters, 2023.
- R. Chandra, R. Menon, Z. Sprague, A. Anantula, and J. Biswas, “Decentralized social navigation with non-cooperative robots via bi-level optimization,” arXiv preprint arXiv:2306.08815, 2023.
- R. Chandra, V. Zinage, E. Bakolas, J. Biswas, and P. Stone, “Decentralized multi-robot social navigation in constrained environments via game-theoretic control barrier functions,” arXiv preprint arXiv:2308.10966, 2023.
- S. Poddar, C. Mavrogiannis, and S. S. Srinivasa, “From crowd motion prediction to robot navigation in crowds,” 2023.
- A. Francis, C. Pérez-D’Arpino, C. Li, F. Xia, A. Alahi, R. Alami, A. Bera, A. Biswas, J. Biswas, R. Chandra, H.-T. L. Chiang, M. Everett, S. Ha, J. Hart, J. P. How, H. Karnan, T.-W. E. Lee, L. J. Manso, R. Mirksy, S. Pirk, P. T. Singamaneni, P. Stone, A. V. Taylor, P. Trautman, N. Tsoi, M. Vázquez, X. Xiao, P. Xu, N. Yokoyama, A. Toshev, and R. Martín-Martín, “Principles and guidelines for evaluating social robot navigation algorithms,” 2023.
- F. Torabi, G. Warnell, and P. Stone, “Recent advances in imitation learning from observation,” arXiv preprint arXiv:1905.13566, 2019.
- B. Zheng, S. Verma, J. Zhou, I. W. Tsang, and F. Chen, “Imitation learning: Progress, taxonomies and challenges,” IEEE Transactions on Neural Networks and Learning Systems, no. 99, pp. 1–16, 2022.
- B. D. Ziebart, A. L. Maas, J. A. Bagnell, A. K. Dey, et al., “Maximum entropy inverse reinforcement learning.,” in Aaai, vol. 8, pp. 1433–1438, Chicago, IL, USA, 2008.
- M. Bain and C. Sammut, “A framework for behavioural cloning,” in Machine Intelligence 15, 1995.
- S. Ross, G. J. Gordon, and J. A. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” 2011.
- S. Daftry, J. A. Bagnell, and M. Hebert, “Learning transferable policies for monocular reactive mav control,” 2016.
- P. Sermanet, C. Lynch, Y. Chebotar, J. Hsu, E. Jang, and S. Levine, “Time-contrastive networks: Self-supervised learning from video,” 2018.
- N. Ratliff, D. Bradley, J. A. Bagnell, and J. Chestnutt, “Boosting structured prediction for imitation learning,” in Proceedings of the 19th International Conference on Neural Information Processing Systems, NIPS’06, (Cambridge, MA, USA), pp. 1153–1160, MIT Press, 2006.
- C. L. Baker, R. Saxe, and J. Tenenbaum, “Action understanding as inverse planning,” Cognition, vol. 113, pp. 329–349, 2009.
- R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. MIT Press, 2018.
- J. Ho and S. Ermon, “Generative adversarial imitation learning,” in Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 4572–4580, 2016.
- I. Kostrikov, O. Nachum, and J. Tompson, “Imitation learning via off-policy distribution matching,” 2019.
- R. Dadashi, L. Hussenot, M. Geist, and O. Pietquin, “Primal wasserstein imitation learning,” CoRR, vol. abs/2006.04678, 2020.
- P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse reinforcement learning,” in Proceedings of the twenty-first international conference on Machine learning, p. 1, 2004.
- A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese, “Social lstm: Human trajectory prediction in crowded spaces,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 961–971, 2016.
- S. Levine and V. Koltun, “Continuous inverse optimal control with locally optimal examples,” arXiv preprint arXiv:1206.4617, 2012.
- C. Finn, S. Levine, and P. Abbeel, “Guided cost learning: Deep inverse optimal control via policy optimization,” in Proceedings of the International Conference on Machine Learning, pp. 49–58, 2016.
- S. A. et al., “Imitating human reaching motions using physically inspired optimization principles,” in Proceedings of the 11th IEEE-RAS International Conference on Humanoid Robots, pp. 602–607, 2011.
- K. Mombaur, A. T. Truong, and J.-P. Laumond, “From human to humanoid locomotion–an inverse optimal control approach,” Autonomous Robots, vol. 28, no. 3, pp. 369–383, 2010.
- L. Yu, J. Song, and S. Ermon, “Multi-agent adversarial inverse reinforcement learning,” in International Conference on Machine Learning, pp. 7194–7201, PMLR, 2019.
- J. Song, H. Ren, D. Sadigh, and S. Ermon, “Multi-agent generative adversarial imitation learning,” Advances in neural information processing systems, vol. 31, 2018.
- W. Schwarting, A. Pierson, J. Alonso-Mora, S. Karaman, and D. Rus, “Social behavior for autonomous vehicles,” Proc. Nat. Acad. Sci., vol. 116, no. 50, pp. 24972–24978, 2019.
- S. Lecleac’h, M. Schwager, and Z. Manchester, “Lucidgames: Online unscented inverse dynamic games for adaptive trajectory prediction and planning,” IEEE Robot. Automat. Lett., vol. 6, pp. 5485–5492, Jul. 2021.
- S. Rothfuß, J. Inga, F. Köpf, M. Flad, and S. Hohmann, “Inverse optimal control for identification in non-cooperative differential games,” in IFAC-PapersOnLine, vol. 50, pp. 14909–14915, 2017.
- F. Köpf, J. Inga, S. Rothfuß, M. Flad, and S. Hohmann, “Inverse reinforcement learning for identification in linear-quadratic dynamic games,” in IFAC-PapersOnLine, vol. 50, pp. 14902–14908, 2017.
- N. Mehr, M. Wang, M. Bhatt, and M. Schwager, “Maximum-entropy multi-agent dynamic games: Forward and inverse solutions,” IEEE Transactions on Robotics, 2023.
- F. Torabi, G. Warnell, and P. Stone, “Behavioral cloning from observation,” arXiv preprint arXiv:1805.01954, 2018.
- K. Mangalam, H. Girase, S. Agarwal, K.-H. Lee, E. Adeli, J. Malik, and A. Gaidon, “It is not the journey but the destination: Endpoint conditioned trajectory prediction,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pp. 759–776, Springer, 2020.
- D. Gonon and A. Billard, “Inverse reinforcement learning of pedestrian–robot coordination,” IEEE Robotics and Automation Letters, 2023.
- S. J. Wright, “Coordinate descent algorithms,” Mathematical programming, vol. 151, no. 1, pp. 3–34, 2015.
- C. Schöller, V. Aravantinos, F. Lay, and A. Knoll, “What the constant velocity model can teach us about pedestrian motion prediction,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 1696–1703, 2020.
- F. Giuliari, I. Hasan, M. Cristani, and F. Galasso, “Transformer networks for trajectory forecasting,” in 2020 25th international conference on pattern recognition (ICPR), pp. 10335–10342, IEEE, 2021.
- X. Xiao, Z. Xu, Z. Wang, Y. Song, G. Warnell, P. Stone, T. Zhang, S. Ravi, G. Wang, H. Karnan, J. Biswas, N. Mohammad, L. Bramblett, R. Peddi, N. Bezzo, Z. Xie, and P. Dames, “Autonomous ground navigation in highly constrained spaces: Lessons learned from the barn challenge at icra 2022,” 2022.
- R. Martin-Martin, M. Patel, H. Rezatofighi, A. Shenoi, J. Gwak, E. Frankel, A. Sadeghian, and S. Savarese, “Jrdb: A dataset and benchmark of egocentric robot visual perception of humans in built environments,” IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 6, pp. 6748–6765, 2021.
- S. Ettinger, S. Cheng, B. Caine, C. Liu, H. Zhao, S. Pradhan, Y. Chai, B. Sapp, C. R. Qi, Y. Zhou, et al., “Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9710–9719, 2021.
- W. Zhan, L. Sun, D. Wang, H. Shi, A. Clausse, M. Naumann, J. Kummerle, H. Konigshof, C. Stiller, A. de La Fortelle, et al., “Interaction dataset: An international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps,” arXiv preprint arXiv:1910.03088, 2019.