VLM-Social-Nav: Socially Aware Robot Navigation through Scoring using Vision-Language Models (2404.00210v3)
Abstract: We propose VLM-Social-Nav, a novel Vision-LLM (VLM) based navigation approach to compute a robot's motion in human-centered environments. Our goal is to make real-time decisions on robot actions that are socially compliant with human expectations. We utilize a perception model to detect important social entities and prompt a VLM to generate guidance for socially compliant robot behavior. VLM-Social-Nav uses a VLM-based scoring module that computes a cost term that ensures socially appropriate and effective robot actions generated by the underlying planner. Our overall approach reduces reliance on large training datasets and enhances adaptability in decision-making. In practice, it results in improved socially compliant navigation in human-shared environments. We demonstrate and evaluate our system in four different real-world social navigation scenarios with a Turtlebot robot. We observe at least 27.38% improvement in the average success rate and 19.05% improvement in the average collision rate in the four social navigation scenarios. Our user study score shows that VLM-Social-Nav generates the most socially compliant navigation behavior.
- S. Technology. (2024) Starship. [Online]. Available: https://www.starship.xyz/
- Amazon. (2024) Meet scout. [Online]. Available: https://www.aboutamazon.com/news/transportation/meet-scout
- D. Robotics. (2024) Dilligent robotics. [Online]. Available: https://www.diligentrobots.com/
- Amazon. (2024) Meet astro, a home robot unlike any other. [Online]. Available: https://www.aboutamazon.com/news/devices/meet-astro-a-home-robot-unlike-any-other
- X. Xiao, B. Liu, G. Warnell, and P. Stone, “Motion planning and control for mobile robot navigation using machine learning: a survey,” Autonomous Robots, vol. 46, no. 5, pp. 569–597, 2022.
- R. Mirsky, X. Xiao, J. Hart, and P. Stone, “Conflict avoidance in social navigation–a survey,” ACM Transactions on Human-Robot Interaction, 2024.
- C. Mavrogiannis, F. Baldini, A. Wang, D. Zhao, P. Trautman, A. Steinfeld, and J. Oh, “Core challenges of social robot navigation: A survey,” ACM Transactions on Human-Robot Interaction, vol. 12, no. 3, pp. 1–39, 2023.
- A. Francis, C. Pérez-d’Arpino, C. Li, F. Xia, A. Alahi, R. Alami, A. Bera, A. Biswas, J. Biswas, R. Chandra et al., “Principles and guidelines for evaluating social robot navigation algorithms,” arXiv preprint arXiv:2306.16740, 2023.
- N. Hirose, D. Shah, A. Sridhar, and S. Levine, “Sacson: Scalable autonomous control for social navigation,” IEEE Robotics and Automation Letters, 2023.
- X. Xiao, B. Liu, G. Warnell, J. Fink, and P. Stone, “Appld: Adaptive planner parameter learning from demonstration,” IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 4541–4547, 2020.
- X. Xiao, T. Zhang, K. M. Choromanski, T.-W. E. Lee, A. Francis, J. Varley, S. Tu, S. Singh, P. Xu, F. Xia, S. M. Persson, L. Takayama, R. Frostig, J. Tan, C. Parada, and V. Sindhwani, “Learning model predictive controllers with real-time attention for real-world navigation,” in Conference on robot learning. PMLR, 2022.
- B. Panigrahi, A. H. Raj, M. Nazeri, and X. Xiao, “A study on learning social robot navigation with multimodal perception,” arXiv preprint arXiv:2309.12568, 2023.
- A. H. Raj, Z. Hu, H. Karnan, R. Chandra, A. Payandeh, L. Mao, P. Stone, J. Biswas, and X. Xiao, “Targeted learning: A hybrid approach to social robot navigation,” in 2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024.
- H. Kretzschmar, M. Spies, C. Sprunk, and W. Burgard, “Socially compliant mobile robot navigation via inverse reinforcement learning,” The International Journal of Robotics Research, vol. 35, no. 11, pp. 1289–1307, 2016.
- A. Rudenko, T. P. Kucner, C. S. Swaminathan, R. T. Chadalavada, K. O. Arras, and A. J. Lilienthal, “Thör: Human-robot navigation data collection and accurate motion trajectories dataset,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 676–682, 2020.
- H. Karnan, A. Nair, X. Xiao, G. Warnell, S. Pirk, A. Toshev, J. Hart, J. Biswas, and P. Stone, “Socially compliant navigation dataset (scand): A large-scale dataset of demonstrations for social navigation,” IEEE Robotics and Automation Letters, 2022.
- D. M. Nguyen, M. Nazeri, A. Payandeh, A. Datar, and X. Xiao, “Toward human-like social robot navigation: A large-scale, multi-modal, social human navigation dataset,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 7442–7447.
- J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou et al., “Chain-of-thought prompting elicits reasoning in large language models,” Advances in Neural Information Processing Systems, vol. 35, pp. 24 824–24 837, 2022.
- M. Geva, D. Khashabi, E. Segal, T. Khot, D. Roth, and J. Berant, “Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies,” Transactions of the Association for Computational Linguistics, vol. 9, pp. 346–361, 2021.
- OpenAI, “Gpt-4v(ision) system card,” 2023.
- L. Wen, X. Yang, D. Fu, X. Wang, P. Cai, X. Li, T. Ma, Y. Li, L. Xu, D. Shang et al., “On the road with gpt-4v (ision): Early explorations of visual-language model on autonomous driving,” arXiv preprint arXiv:2311.05332, 2023.
- H. Sha, Y. Mu, Y. Jiang, L. Chen, C. Xu, P. Luo, S. E. Li, M. Tomizuka, W. Zhan, and M. Ding, “Languagempc: Large language models as decision makers for autonomous driving,” arXiv preprint arXiv:2310.03026, 2023.
- C. Huang, O. Mees, A. Zeng, and W. Burgard, “Visual language maps for robot navigation,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 10 608–10 615.
- D. Shah, B. Osiński, b. ichter, and S. Levine, “Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action,” in Proceedings of The 6th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, K. Liu, D. Kulic, and J. Ichnowski, Eds., vol. 205. PMLR, 14–18 Dec 2023, pp. 492–504. [Online]. Available: https://proceedings.mlr.press/v205/shah23b.html
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
- D. Fox, W. Burgard, and S. Thrun, “The dynamic window approach to collision avoidance,” IEEE Robotics & Automation Magazine, vol. 4, no. 1, pp. 23–33, 1997.
- D. A. Pomerleau, “Alvinn: An autonomous land vehicle in a neural network,” Advances in neural information processing systems, vol. 1, 1988.
- J. Van Den Berg, S. J. Guy, M. Lin, and D. Manocha, “Optimal reciprocal collision avoidance for multi-agent navigation,” in Proc. of the IEEE International Conference on Robotics and Automation, Anchorage (AK), USA, 2010.
- J. Liang, Y.-L. Qiao, T. Guan, and D. Manocha, “Of-vo: Efficient navigation among pedestrians using commodity sensors,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 6148–6155, 2021.
- J. Liang, U. Patel, A. J. Sathyamoorthy, and D. Manocha, “Crowd-steer: Realtime smooth and collision-free robot navigation in densely crowded scenarios trained using high-fidelity simulation,” in Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, 2021, pp. 4221–4228.
- S. H. Arul, J. J. Park, and D. Manocha, “Ds-mpepc: Safe and deadlock-avoiding robot navigation in cluttered dynamic scenes,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023, pp. 2256–2263.
- G. Williams, N. Wagener, B. Goldfain, P. Drews, J. M. Rehg, B. Boots, and E. A. Theodorou, “Information theoretic mpc for model-based reinforcement learning,” in 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2017, pp. 1714–1721.
- A. Best, S. Narang, S. Curtis, and D. Manocha, “Densesense: Interactive crowd simulation using density-dependent filters.” in Symposium on Computer Animation, 2014, pp. 97–102.
- B. Gopalakrishnan, A. K. Singh, M. Kaushik, K. M. Krishna, and D. Manocha, “Prvo: Probabilistic reciprocal velocity obstacle for multi robot navigation under uncertainty,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 1089–1096.
- S. Aradi, “Survey of deep reinforcement learning for motion planning of autonomous vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 2, pp. 740–759, 2020.
- H. Sun, W. Zhang, R. Yu, and Y. Zhang, “Motion planning for mobile robots—focusing on deep reinforcement learning: A systematic review,” IEEE Access, vol. 9, pp. 69 061–69 081, 2021.
- Y. Gao and C.-M. Huang, “Evaluation of socially-aware robot navigation,” Frontiers in Robotics and AI, vol. 8, p. 721317, 2022.
- M. H. Nazeri and M. Bohlouli, “Exploring reflective limitation of behavior cloning in autonomous vehicles,” in 2021 IEEE International Conference on Data Mining (ICDM), 2021, pp. 1252–1257.
- B. D. Ziebart, N. Ratliff, G. Gallagher, C. Mertz, K. Peterson, J. A. Bagnell, M. Hebert, A. K. Dey, and S. Srinivasa, “Planning-based prediction for pedestrians,” in 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2009, pp. 3931–3936.
- P. Liu, D. F. Glas, T. Kanda, and H. Ishiguro, “Data-driven hri: Learning social behaviors by example from human–human interaction,” IEEE Transactions on Robotics, vol. 32, no. 4, pp. 988–1008, 2016.
- M. Li, R. Jiang, S. S. Ge, and T. H. Lee, “Role playing learning for socially concomitant mobile robot navigation,” 2017.
- M. Nazeri, J. Wang, A. Payandeh, and X. Xiao, “Vanp: Learning where to see for navigation with self-supervised vision-action pre-training,” 2024.
- A. Payandeh, K. T. Baghaei, P. Fayyazsanavi, S. B. Ramezani, Z. Chen, and S. Rahimi, “Deep representation learning: Fundamentals, technologies, applications, and open challenges,” IEEE Access, vol. 11, pp. 137 621–137 659, 2023.
- R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill et al., “On the opportunities and risks of foundation models,” arXiv preprint arXiv:2108.07258, 2022.
- A. Payandeh, D. Pluth, J. Hosier, X. Xiao, and V. K. Gurbani, “How susceptible are llms to logical fallacies?” 2023.
- M. Ahn, A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, C. Finn, C. Fu, K. Gopalakrishnan, K. Hausman et al., “Do as i can, not as i say: Grounding language in robotic affordances,” 2022.
- J. Mao, Y. Qian, J. Ye, H. Zhao, and Y. Wang, “Gpt-driver: Learning to drive with gpt,” 2023.
- B. Yu, H. Kasaei, and M. Cao, “L3mvn: Leveraging large language models for visual target navigation,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, Oct. 2023. [Online]. Available: http://dx.doi.org/10.1109/IROS55552.2023.10342512
- B. Li, Y. Wang, J. Mao, B. Ivanovic, S. Veer, K. Leung, and M. Pavone, “Driving everywhere with large language model policy adaptation,” 2024.
- D. Shah, B. Osinski, B. Ichter, and S. Levine, “Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action,” 2022.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” 2021.
- A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, J. Hsu et al., “Rt-1: Robotics transformer for real-world control at scale,” arXiv preprint arXiv:2212.06817, 2022.
- A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess, A. Dubey, C. Finn et al., “Rt-2: Vision-language-action models transfer web knowledge to robotic control,” arXiv preprint arXiv:2307.15818, 2023.
- C. Xu, H. T. Nguyen, C. Amato, and L. L. S. Wong, “Vision and language navigation in the real world via online visual language mapping,” 2023.
- J. Rios-Martinez, A. Spalanzani, and C. Laugier, “From proxemics theory to socially-aware navigation: A survey,” International Journal of Social Robotics, vol. 7, pp. 137–153, 2015.
- S. Min, X. Lyu, A. Holtzman, M. Artetxe, M. Lewis, H. Hajishirzi, and L. Zettlemoyer, “Rethinking the role of demonstrations: What makes in-context learning work?” arXiv preprint arXiv:2202.12837, 2022.
- S. Pirk, E. Lee, X. Xiao, L. Takayama, A. Francis, and A. Toshev, “A protocol for validating social navigation policies,” arXiv preprint arXiv:2204.05443, 2022.
- Daeun Song (12 papers)
- Jing Liang (89 papers)
- Amirreza Payandeh (9 papers)
- Xuesu Xiao (91 papers)
- Dinesh Manocha (366 papers)
- Amir Hossain Raj (8 papers)