Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OceanChat: Piloting Autonomous Underwater Vehicles in Natural Language (2309.16052v1)

Published 27 Sep 2023 in cs.RO

Abstract: In the trending research of fusing LLMs and robotics, we aim to pave the way for innovative development of AI systems that can enable Autonomous Underwater Vehicles (AUVs) to seamlessly interact with humans in an intuitive manner. We propose OceanChat, a system that leverages a closed-loop LLM-guided task and motion planning framework to tackle AUV missions in the wild. LLMs translate an abstract human command into a high-level goal, while a task planner further grounds the goal into a task sequence with logical constraints. To assist the AUV with understanding the task sequence, we utilize a motion planner to incorporate real-time Lagrangian data streams received by the AUV, thus mapping the task sequence into an executable motion plan. Considering the highly dynamic and partially known nature of the underwater environment, an event-triggered replanning scheme is developed to enhance the system's robustness towards uncertainty. We also build a simulation platform HoloEco that generates photo-realistic simulation for a wide range of AUV applications. Experimental evaluation verifies that the proposed system can achieve improved performance in terms of both success rate and computation time. Project website: \url{https://sites.google.com/view/oceanchat}

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. F. Zhang, D. M. Fratantoni, D. A. Paley, J. M. Lund, and N. E. Leonard, “Control of coordinated patterns for ocean sampling,” International Journal of Control, vol. 80, no. 7, pp. 1186–1199, 2007.
  2. D. A. Paley, F. Zhang, and N. E. Leonard, “Cooperative control for ocean sampling: The glider coordinated control system,” IEEE Transactions on Control Systems Technology, vol. 16, no. 4, pp. 735–744, 2008.
  3. J. Nicholson and A. Healey, “The present state of autonomous underwater vehicle (auv) applications and technologies,” Marine Technology Society Journal, vol. 42, no. 1, pp. 44–51, 2008.
  4. S. Tellex, N. Gopalan, H. Kress-Gazit, and C. Matuszek, “Robots that use language,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 3, pp. 25–55, 2020.
  5. M. Bollini, S. Tellex, T. Thompson, N. Roy, and D. Rus, “Interpreting and executing recipes with a cooking robot,” in Experimental Robotics: The 13th International Symposium on Experimental Robotics.   Springer, 2013, pp. 481–495.
  6. S. Tellex, T. Kollar, S. Dickerson, M. Walter, A. Banerjee, S. Teller, and N. Roy, “Understanding natural language commands for robotic navigation and mobile manipulation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 25, no. 1, 2011, pp. 1507–1514.
  7. W. Huang, P. Abbeel, D. Pathak, and I. Mordatch, “Language models as zero-shot planners: Extracting actionable knowledge for embodied agents,” in International Conference on Machine Learning.   PMLR, 2022, pp. 9118–9147.
  8. I. Singh, V. Blukis, A. Mousavian, A. Goyal, D. Xu, J. Tremblay, D. Fox, J. Thomason, and A. Garg, “ProgPrompt: Generating situated robot task plans using large language models,” in International Conference on Robotics and Automation (ICRA), 2023.
  9. S. Vemprala, R. Bonatti, A. Bucker, and A. Kapoor, “Chatgpt for robotics: Design principles and model abilities,” Microsoft, Tech. Rep. MSR-TR-2023-8, February 2023.
  10. M. Ahn, A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, C. Finn, C. Fu, K. Gopalakrishnan, K. Hausman, A. Herzog, D. Ho, J. Hsu, J. Ibarz, B. Ichter, A. Irpan, E. Jang, R. J. Ruano, K. Jeffrey, S. Jesmonth, N. J. Joshi, R. Julian, D. Kalashnikov, Y. Kuang, K.-H. Lee, S. Levine, Y. Lu, L. Luu, C. Parada, P. Pastor, J. Quiambao, K. Rao, J. Rettinghouse, D. Reyes, P. Sermanet, N. Sievers, C. Tan, A. Toshev, V. Vanhoucke, F. Xia, T. Xiao, P. Xu, S. Xu, M. Yan, and A. Zeng, “Do as i can, not as i say: Grounding language in robotic affordances,” 2022.
  11. J. Liang, W. Huang, F. Xia, P. Xu, K. Hausman, B. Ichter, P. Florence, and A. Zeng, “Code as policies: Language model programs for embodied control,” 2023.
  12. T. Silver, S. Dan, K. Srinivas, J. B. Tenenbaum, L. P. Kaelbling, and M. Katz, “Generalized planning in pddl domains with pretrained large language models,” 2023.
  13. J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou et al., “Chain-of-thought prompting elicits reasoning in large language models,” Advances in Neural Information Processing Systems, vol. 35, pp. 24 824–24 837, 2022.
  14. N. Wake, A. Kanehira, K. Sasabuchi, J. Takamatsu, and K. Ikeuchi, “Chatgpt empowered long-step robot control in various environments: A case application,” 2023.
  15. C. R. Garrett, C. Paxton, T. Lozano-Pérez, L. P. Kaelbling, and D. Fox, “Online replanning in belief space for partially observable task and motion problems,” in 2020 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 5678–5684.
  16. S. S. Raman, V. Cohen, E. Rosen, I. Idrees, D. Paulius, and S. Tellex, “Planning with large language models via corrective re-prompting,” 2022.
  17. P. Sharma, B. Sundaralingam, V. Blukis, C. Paxton, T. Hermans, A. Torralba, J. Andreas, and D. Fox, “Correcting robot plans with natural language feedback,” 2022.
  18. W. Huang, F. Xia, T. Xiao, H. Chan, J. Liang, P. Florence, A. Zeng, J. Tompson, I. Mordatch, Y. Chebotar et al., “Inner monologue: Embodied reasoning through planning with language models,” arXiv preprint arXiv:2207.05608, 2022.
  19. W. Huang, C. Wang, R. Zhang, Y. Li, J. Wu, and L. Fei-Fei, “Voxposer: Composable 3d value maps for robotic manipulation with language models,” 2023.
  20. OpenAI, “Gpt-4 technical report,” 2023.
  21. D. Driess, F. Xia, M. S. M. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu, W. Huang, Y. Chebotar, P. Sermanet, D. Duckworth, S. Levine, V. Vanhoucke, K. Hausman, M. Toussaint, K. Greff, A. Zeng, I. Mordatch, and P. Florence, “Palm-e: An embodied multimodal language model,” 2023.
  22. S. Cambon, R. Alami, and F. Gravot, “A hybrid approach to intricate motion, manipulation and task planning,” The International Journal of Robotics Research, vol. 28, no. 1, pp. 104–126, 2009.
  23. C. R. Garrett, R. Chitnis, R. Holladay, B. Kim, T. Silver, L. P. Kaelbling, and T. Lozano-Pérez, “Integrated task and motion planning,” Annual review of control, robotics, and autonomous systems, vol. 4, pp. 265–293, 2021.
  24. T. Lozano-Pérez, J. L. Jones, E. Mazer, and P. A. O’Donnell, “Task-level planning of pick-and-place robot motions,” Computer, vol. 22, no. 3, pp. 21–29, 1989.
  25. M. Gualtieri and R. Platt, “Robotic pick-and-place with uncertain object instance segmentation and shape completion,” IEEE robotics and automation letters, vol. 6, no. 2, pp. 1753–1760, 2021.
  26. T. Siméon, J.-P. Laumond, J. Cortés, and A. Sahbani, “Manipulation planning with probabilistic roadmaps,” The International Journal of Robotics Research, vol. 23, no. 7-8, pp. 729–746, 2004.
  27. L. L. Wong, L. P. Kaelbling, and T. Lozano-Pérez, “Manipulation-based active search for occluded objects,” in 2013 IEEE International Conference on Robotics and Automation.   IEEE, 2013, pp. 2814–2819.
  28. M. Stilman and J. J. Kuffner, “Navigation among movable obstacles: Real-time reasoning in complex environments,” International Journal of Humanoid Robotics, vol. 2, no. 04, pp. 479–503, 2005.
  29. S. Thrun, M. Beetz, M. Bennewitz, W. Burgard, A. B. Cremers, F. Dellaert, D. Fox, D. Haehnel, C. Rosenberg, N. Roy et al., “Probabilistic algorithms and the interactive museum tour-guide robot minerva,” The international journal of robotics research, vol. 19, no. 11, pp. 972–999, 2000.
  30. J. Van Den Berg, P. Abbeel, and K. Goldberg, “Lqg-mp: Optimized path planning for robots with motion uncertainty and imperfect state information,” The International Journal of Robotics Research, vol. 30, no. 7, pp. 895–913, 2011.
  31. J. E. King, M. Cognetti, and S. S. Srinivasa, “Rearrangement planning using object-centric and robot-centric action spaces,” in 2016 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2016, pp. 3940–3947.
  32. Y. Ding, X. Zhang, C. Paxton, and S. Zhang, “Task and motion planning with large language models for object rearrangement,” arXiv preprint arXiv:2303.06247, 2023.
  33. C. R. Garrett, T. Lozano-Pérez, and L. P. Kaelbling, “Ffrob: Leveraging symbolic planning for efficient task and motion planning,” The International Journal of Robotics Research, vol. 37, no. 1, pp. 104–136, 2018.
  34. S. Srivastava, E. Fang, L. Riano, R. Chitnis, S. Russell, and P. Abbeel, “Combined task and motion planning through an extensible planner-independent interface layer,” in 2014 IEEE international conference on robotics and automation (ICRA).   IEEE, 2014, pp. 639–646.
  35. L. P. Kaelbling and T. Lozano-Pérez, “Hierarchical task and motion planning in the now,” in 2011 IEEE International Conference on Robotics and Automation.   IEEE, 2011, pp. 1470–1477.
  36. C. Fritz and S. McIlraith, “Computing robust plans in continuous domains,” in Proceedings of the International Conference on Automated Planning and Scheduling, vol. 19, 2009, pp. 346–349.
  37. M. Fox and D. Long, “Pddl2. 1: An extension to pddl for expressing temporal planning domains,” Journal of artificial intelligence research, vol. 20, pp. 61–124, 2003.
  38. M. R. Dogar and S. S. Srinivasa, “A planning framework for non-prehensile manipulation under clutter and uncertainty,” Autonomous Robots, vol. 33, pp. 217–236, 2012.
  39. T. Lozano-Pérez, M. T. Mason, and R. H. Taylor, “Automatic synthesis of fine-motion strategies for robots,” The International Journal of Robotics Research, vol. 3, no. 1, pp. 3–24, 1984.
  40. R. Alterovitz, T. Siméon, and K. Goldberg, “The stochastic motion roadmap: A sampling framework for planning with markov motion uncertainty,” in Robotics: Science and systems, 2007.
  41. M. Levihn, J. Scholz, and M. Stilman, “Hierarchical decision theoretic planning for navigation among movable obstacles,” in Algorithmic Foundations of Robotics X: Proceedings of the Tenth Workshop on the Algorithmic Foundations of Robotics.   Springer, 2013, pp. 19–35.
  42. R. D. Smallwood and E. J. Sondik, “The optimal control of partially observable markov processes over a finite horizon,” Operations research, vol. 21, no. 5, pp. 1071–1088, 1973.
  43. H. Kurniawati, D. Hsu, and W. S. Lee, “Sarsop: Efficient point-based pomdp planning by approximating optimally reachable belief spaces.” in Robotics: Science and systems, vol. 2008.   Citeseer, 2008.
  44. L. P. Kaelbling and T. Lozano-Pérez, “Integrated task and motion planning in belief space,” The International Journal of Robotics Research, vol. 32, no. 9-10, pp. 1194–1227, 2013.
  45. A. Curtis, X. Fang, L. P. Kaelbling, T. Lozano-Pérez, and C. R. Garrett, “Long-horizon manipulation of unknown objects via task and motion planning with estimated affordances,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 1940–1946.
  46. M. Hou, Y. Li, F. Zhang, S. Sundaram, and S. Mou, “An interleaved algorithm for integration of robotic task and motion planning,” in 2023 American Control Conference (ACC).   IEEE, 2023, pp. 539–544.
  47. D. Cook, A. Vardy, and R. Lewis, “A survey of auv and robot simulators for multi-vehicle operations,” in 2014 IEEE/OES Autonomous Underwater Vehicles (AUV).   IEEE, 2014, pp. 1–8.
  48. A. Sehgal and D. Cernea, “A multi-auv missions simulation framework for the usarsim robotics simulator,” in 18th Mediterranean Conference on Control and Automation, MED’10.   IEEE, 2010, pp. 1188–1193.
  49. D.-H. Gwon, J. Kim, M. H. Kim, H. G. Park, T. Y. Kim, and A. Kim, “Development of a side scan sonar module for the underwater simulator,” in 2017 14th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI).   IEEE, 2017, pp. 662–665.
  50. K. J. DeMarco, M. E. West, and A. M. Howard, “A computationally-efficient 2d imaging sonar model for underwater robotics simulations in gazebo,” in OCEANS 2015-MTS/IEEE Washington.   IEEE, 2015, pp. 1–7.
  51. P. N. Senarathne, W. S. Wijesoma, K. W. Lee, B. Kalyan, M. Moratuwage, N. M. Patrikalakis, and F. S. Hover, “Marinesim: Robot simulation for marine environments,” in OCEANS’10 IEEE SYDNEY.   IEEE, 2010, pp. 1–5.
  52. M. M. M. Manhães, S. A. Scherer, M. Voss, L. R. Douat, and T. Rauschenbach, “UUV simulator: A gazebo-based package for underwater intervention and multi-robot simulation,” in OCEANS 2016 MTS/IEEE Monterey.   IEEE, sep 2016.
  53. M. Prats, J. Pérez, J. J. Fernández, and P. J. Sanz, “An open source tool for simulation and supervision of underwater intervention missions,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012, pp. 2577–2582.
  54. E. Potokar, S. Ashford, M. Kaess, and J. Mangelson, “HoloOcean: An underwater robotics simulator,” in Proc. IEEE Intl. Conf. on Robotics and Automation, ICRA, Philadelphia, PA, USA, May 2022.
  55. C. Wang, F. Zhang, and D. Schaefer, “Dynamic modeling of an autonomous underwater vehicle,” Journal of Marine Science and Technology, vol. 20, pp. 199–212, 2015.
  56. B. Yu, J. Wu, and M. J. Islam, “Udepth: Fast monocular depth estimation for visually-guided underwater robots,” 2023.
  57. C. R. Garrett, T. Lozano-Pérez, and L. P. Kaelbling, “Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning,” in Proceedings of the International Conference on Automated Planning and Scheduling, vol. 30, 2020, pp. 440–448.
  58. D. Hadfield-Menell, E. Groshev, R. Chitnis, and P. Abbeel, “Modular task and motion planning in belief space,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2015, pp. 4991–4998.
  59. J. K. Li, D. Hsu, and W. S. Lee, “Act to see and see to act: Pomdp planning for objects search in clutter,” in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2016, pp. 5701–5707.
  60. Y. Sun, X. Luo, X. Ran, and G. Zhang, “A 2d optimal path planning algorithm for autonomous underwater vehicle driving in unknown underwater canyons,” Journal of Marine Science and Engineering, vol. 9, no. 3, p. 252, 2021.
  61. J. Luketina, N. Nardelli, G. Farquhar, J. Foerster, J. Andreas, E. Grefenstette, S. Whiteson, and T. Rocktäschel, “A survey of reinforcement learning informed by natural language,” arXiv preprint arXiv:1906.03926, 2019.
  62. J. Andreas, D. Klein, and S. Levine, “Modular multitask reinforcement learning with policy sketches,” in International conference on machine learning.   PMLR, 2017, pp. 166–175.
  63. Y. Jiang, S. S. Gu, K. P. Murphy, and C. Finn, “Language as an abstraction for hierarchical deep reinforcement learning,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  64. Y. Cui, S. Niekum, A. Gupta, V. Kumar, and A. Rajeswaran, “Can foundation models perform zero-shot task specification for robot manipulation?” in Learning for Dynamics and Control Conference.   PMLR, 2022, pp. 893–905.
  65. Y. J. Ma, S. Sodhani, D. Jayaraman, O. Bastani, V. Kumar, and A. Zhang, “Vip: Towards universal visual reward and representation via value-implicit pre-training,” arXiv preprint arXiv:2210.00030, 2022.
  66. C. Li, R. Zhang, J. Wong, C. Gokmen, S. Srivastava, R. Martín-Martín, C. Wang, G. Levine, M. Lingelbach, J. Sun et al., “Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation,” in Conference on Robot Learning.   PMLR, 2023, pp. 80–93.
  67. S. Bahl, A. Gupta, and D. Pathak, “Human-to-robot imitation in the wild,” arXiv preprint arXiv:2207.09450, 2022.
  68. Y. J. Ma, W. Liang, V. Som, V. Kumar, A. Zhang, O. Bastani, and D. Jayaraman, “Liv: Language-image representations and rewards for robotic control,” arXiv preprint arXiv:2306.00958, 2023.
  69. C. Colas, T. Karch, N. Lair, J.-M. Dussoux, C. Moulin-Frier, P. Dominey, and P.-Y. Oudeyer, “Language as a cognitive tool to imagine goals in curiosity driven exploration,” Advances in Neural Information Processing Systems, vol. 33, pp. 3761–3774, 2020.
  70. Y. Du, O. Watkins, Z. Wang, C. Colas, T. Darrell, P. Abbeel, A. Gupta, and J. Andreas, “Guiding pretraining in reinforcement learning with large language models,” arXiv preprint arXiv:2302.06692, 2023.
  71. J. Oh, S. Singh, H. Lee, and P. Kohli, “Zero-shot task generalization with multi-task deep reinforcement learning,” in International Conference on Machine Learning.   PMLR, 2017, pp. 2661–2670.
  72. E. Jang, A. Irpan, M. Khansari, D. Kappler, F. Ebert, C. Lynch, S. Levine, and C. Finn, “Bc-z: Zero-shot task generalization with robotic imitation learning,” in Conference on Robot Learning.   PMLR, 2022, pp. 991–1002.
  73. J. Wei, M. Bosma, V. Y. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, and Q. V. Le, “Finetuned language models are zero-shot learners,” arXiv preprint arXiv:2109.01652, 2021.
  74. A. Zeng, M. Attarian, B. Ichter, K. Choromanski, A. Wong, S. Welker, F. Tombari, A. Purohit, M. Ryoo, V. Sindhwani et al., “Socratic models: Composing zero-shot multimodal reasoning with language,” arXiv preprint arXiv:2204.00598, 2022.
Citations (3)

Summary

We haven't generated a summary for this paper yet.