Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Large Language Models for Robotics: Opportunities, Challenges, and Perspectives (2401.04334v1)

Published 9 Jan 2024 in cs.RO and cs.AI

Abstract: LLMs have undergone significant expansion and have been increasingly integrated across various domains. Notably, in the realm of robot task planning, LLMs harness their advanced reasoning and language comprehension capabilities to formulate precise and efficient action plans based on natural language instructions. However, for embodied tasks, where robots interact with complex environments, text-only LLMs often face challenges due to a lack of compatibility with robotic visual perception. This study provides a comprehensive overview of the emerging integration of LLMs and multimodal LLMs into various robotic tasks. Additionally, we propose a framework that utilizes multimodal GPT-4V to enhance embodied task planning through the combination of natural language instructions and robot visual perceptions. Our results, based on diverse datasets, indicate that GPT-4V effectively enhances robot performance in embodied tasks. This extensive survey and evaluation of LLMs and multimodal LLMs across a variety of robotic tasks enriches the understanding of LLM-centric embodied intelligence and provides forward-looking insights toward bridging the gap in Human-Robot-Environment interaction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (120)
  1. W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong et al., “A survey of large language models,” arXiv preprint arXiv:2303.18223, 2023.
  2. J. Wang, E. Shi, S. Yu, Z. Wu, C. Ma, H. Dai, Q. Yang, Y. Kang, J. Wu, H. Hu et al., “Prompt engineering for healthcare: Methodologies and applications,” arXiv preprint arXiv:2304.14670, 2023.
  3. Y. Liu, T. Han, S. Ma, J. Zhang, Y. Yang, J. Tian, H. He, A. Li, M. He, Z. Liu et al., “Summary of chatgpt-related research and perspective towards the future of large language models,” Meta-Radiology, p. 100017, 2023.
  4. Y. Liu, H. He, T. Han, X. Zhang, M. Liu, J. Tian, Y. Zhang, J. Wang, X. Gao, T. Zhong, Y. Pan, S. Xu, Z. Wu, Z. Liu, X. Zhang, S. Zhang, X. Hu, T. Zhang, N. Qiang, T. Liu, and B. Ge, “Understanding llms: A comprehensive overview from training to inference,” arXiv preprint arXiv:2401.02038, 2024.
  5. C. Zhou, Q. Li, C. Li, J. Yu, Y. Liu, G. Wang, K. Zhang, C. Ji, Q. Yan, L. He, H. Peng, J. Li, J. Wu, Z. Liu, P. Xie, C. Xiong, J. Pei, P. S. Yu, and L. Sun, “A comprehensive survey on pretrained foundation models: A history from bert to chatgpt,” 2023.
  6. L. Zhao, L. Zhang, Z. Wu, Y. Chen, H. Dai, X. Yu, Z. Liu, T. Zhang, X. Hu, X. Jiang et al., “When brain-inspired ai meets agi,” Meta-Radiology, p. 100005, 2023.
  7. Z. Liu, M. He, Z. Jiang, Z. Wu, H. Dai, L. Zhang, S. Luo, T. Han, X. Li, X. Jiang et al., “Survey on natural language processing in medical image analysis.” Zhong nan da xue xue bao. Yi xue ban= Journal of Central South University. Medical Sciences, vol. 47, no. 8, pp. 981–993, 2022.
  8. M. S. Rahaman, M. T. Ahsan, N. Anjum, H. J. R. Terano, and M. M. Rahman, “From chatgpt-3 to gpt-4: a significant advancement in ai-driven nlp tools,” Journal of Engineering and Emerging Technologies, vol. 2, no. 1, pp. 1–11, 2023.
  9. J. Kaddour, J. Harris, M. Mozes, H. Bradley, R. Raileanu, and R. McHardy, “Challenges and applications of large language models,” arXiv preprint arXiv:2307.10169, 2023.
  10. Z. Liu, Y. Li, P. Shu, A. Zhong, L. Yang, C. Ju, Z. Wu, C. Ma, J. Luo, C. Chen et al., “Radiology-llama2: Best-in-class large language model for radiology,” arXiv preprint arXiv:2309.06419, 2023.
  11. C. Ma, Z. Wu, J. Wang, S. Xu, Y. Wei, Z. Liu, L. Guo, X. Cai, S. Zhang, T. Zhang et al., “Impressiongpt: an iterative optimizing framework for radiology report summarization with chatgpt,” arXiv preprint arXiv:2304.08448, 2023.
  12. Z. Liu, T. Zhong, Y. Li, Y. Zhang, Y. Pan, Z. Zhao, P. Dong, C. Cao, Y. Liu, P. Shu et al., “Evaluating large language models for radiology natural language processing,” arXiv preprint arXiv:2307.13693, 2023.
  13. S. Rezayi, H. Dai, Z. Liu, Z. Wu, A. Hebbar, A. H. Burns, L. Zhao, D. Zhu, Q. Li, W. Liu, S. Li, T. Liu, and X. Li, “Clinicalradiobert: Knowledge-infused few shot learning for clinical notes named entity recognition,” in Machine Learning in Medical Imaging, C. Lian, X. Cao, I. Rekik, X. Xu, and Z. Cui, Eds.   Cham: Springer Nature Switzerland, 2022, pp. 269–278.
  14. Z. Liu, M. He, Z. Jiang, Z. Wu, H. Dai, L. Zhang, S. Luo, T. Han, X. Li, X. Jiang, D. Zhu, X. Cai, B. Ge, W. Liu, J. Liu, D. Shen, and T. Liu, “Survey on natural language processing in medical image analysis,” Zhong nan da xue xue bao. Yi xue ban = Journal of Central South University. Medical sciences, vol. 47, no. 8, p. 981—993, August 2022. [Online]. Available: https://doi.org/10.11817/j.issn.1672-7347.2022.220376
  15. W. Liao, Z. Liu, H. Dai, Z. Wu, Y. Zhang, X. Huang, Y. Chen, X. Jiang, W. Liu, D. Zhu, T. Liu, S. Li, X. Li, and H. Cai, “Mask-guided bert for few shot text classification,” 2023.
  16. S. Rezayi, Z. Liu, Z. Wu, C. Dhakal, B. Ge, H. Dai, G. Mai, N. Liu, C. Zhen, T. Liu et al., “Exploring new frontiers in agricultural nlp: Investigating the potential of large language models for food applications,” arXiv preprint arXiv:2306.11892, 2023.
  17. H. Dai, Z. Liu, W. Liao, X. Huang, Y. Cao, Z. Wu, L. Zhao, S. Xu, W. Liu, N. Liu et al., “Auggpt: Leveraging chatgpt for text data augmentation,” arXiv preprint arXiv:2302.13007, 2023.
  18. Z. Liu, X. Yu, L. Zhang, Z. Wu, C. Cao, H. Dai, L. Zhao, W. Liu, D. Shen, Q. Li, T. Liu, D. Zhu, and X. Li, “Deid-gpt: Zero-shot medical text de-identification by gpt-4,” 2023.
  19. W. Liao, Z. Liu, H. Dai, S. Xu, Z. Wu, Y. Zhang, X. Huang, D. Zhu, H. Cai, T. Liu et al., “Differentiate chatgpt-generated and human-written medical texts,” arXiv preprint arXiv:2304.11567, 2023.
  20. H. Dai, Y. Li, Z. Liu, L. Zhao, Z. Wu, S. Song, Y. Shen, D. Zhu, X. Li, S. Li, X. Yao, L. Shi, Q. Li, Z. Chen, D. Zhang, G. Mai, and T. Liu, “Ad-autogpt: An autonomous gpt for alzheimer’s disease infodemiology,” 2023.
  21. Z. Guan, Z. Wu, Z. Liu, D. Wu, H. Ren, Q. Li, X. Li, and N. Liu, “Cohortgpt: An enhanced gpt for participant recruitment in clinical study,” arXiv preprint arXiv:2307.11346, 2023.
  22. H. Cai, W. Liao, Z. Liu, Y. Zhang, X. Huang, S. Ding, H. Ren, Z. Wu, H. Dai, S. Li et al., “Coarse-to-fine knowledge graph domain adaptation based on distantly-supervised iterative training,” arXiv preprint arXiv:2211.02849, 2022.
  23. Z. Liu, Z. Wu, M. Hu, B. Zhao, L. Zhao, T. Zhang, H. Dai, X. Chen, Y. Shen, S. Li et al., “Pharmacygpt: The ai pharmacist,” arXiv preprint arXiv:2307.10432, 2023.
  24. Y. Shi, S. Xu, Z. Liu, T. Liu, X. Li, and N. Liu, “Mededit: Model editing for medical question answering with external knowledge bases,” arXiv preprint arXiv:2309.16035, 2023.
  25. X. Gong, J. Holmes, Y. Li, Z. Liu, Q. Gan, Z. Wu, J. Zhang, Y. Zou, Y. Teng, T. Jiang et al., “Evaluating the potential of leading large language models in reasoning biology questions,” arXiv preprint arXiv:2311.07582, 2023.
  26. OpenAI, “Introducing ChatGPT — openai.com,” https://openai.com/blog/chatgpt, [Accessed 28-08-2023].
  27. G. Team, R. Anil, S. Borgeaud, Y. Wu, J.-B. Alayrac, J. Yu, R. Soricut, J. Schalkwyk, A. M. Dai, A. Hauth et al., “Gemini: A family of highly capable multimodal models,” arXiv preprint arXiv:2312.11805, 2023.
  28. J. Wang, Z. Liu, L. Zhao, Z. Wu, C. Ma, S. Yu, H. Dai, Q. Yang, Y. Liu, S. Zhang et al., “Review of large vision models and visual prompt engineering,” arXiv preprint arXiv:2307.00855, 2023.
  29. H. Dai, C. Ma, Z. Liu, Y. Li, P. Shu, X. Wei, L. Zhao, Z. Wu, D. Zhu, W. Liu et al., “Samaug: Point prompt augmentation for segment anything model,” arXiv preprint arXiv:2307.01187, 2023.
  30. L. Zhang, Z. Liu, L. Zhang, Z. Wu, X. Yu, J. Holmes, H. Feng, H. Dai, X. Li, Q. Li et al., “Segment anything model (sam) for radiation oncology,” arXiv preprint arXiv:2306.11730, 2023.
  31. Z. Xiao, Y. Chen, L. Zhang, J. Yao, Z. Wu, X. Yu, Y. Pan, L. Zhao, C. Ma, X. Liu, W. Liu, X. Li, Y. Yuan, D. Shen, D. Zhu, T. Liu, and X. Jiang, “Instruction-vit: Multi-modal prompts for instruction learning in vit,” 2023.
  32. L. Zhao, Z. Wu, H. Dai, Z. Liu, T. Zhang, D. Zhu, and T. Liu, “Embedding human brain function via transformer,” in International Conference on Medical Image Computing and Computer-Assisted Intervention.   Springer Nature Switzerland Cham, 2022, pp. 366–375.
  33. H. Dai, Q. Li, L. Zhao, L. Pan, C. Shi, Z. Liu, Z. Wu, L. Zhang, S. Zhao, X. Wu et al., “Graph representation neural architecture search for optimal spatial/temporal functional brain network decomposition,” in International Workshop on Machine Learning in Medical Imaging.   Springer Nature Switzerland Cham, 2022, pp. 279–287.
  34. Y. Liu, E. Ge, M. He, Z. Liu, S. Zhao, X. Hu, D. Zhu, T. Liu, and B. Ge, “Discovering dynamic functional brain networks via spatial and channel-wise attention,” arXiv preprint arXiv:2205.09576, 2022.
  35. L. Zhang, J. M. Holmes, Z. Liu, S. A. Vora, T. T. Sio, C. E. Vargas, N. Y. Yu, S. R. Keole, S. E. Schild, M. Bues et al., “Beam mask and sliding window-facilitated deep learning-based accurate and efficient dose prediction for pencil beam scanning proton therapy,” arXiv preprint arXiv:2305.18572, 2023.
  36. Z. Liu, R. J. Crouser, and A. Ottley, “Survey on individual differences in visualization,” in Computer Graphics Forum, 39: 693-712. doi:10.1111/cgf.14033, 2020.
  37. X.-A. Bi, K. Chen, S. Jiang, S. Luo, W. Zhou, Z. Xing, L. Xu, Z. Liu, and T. Liu, “Community graph convolution neural network for alzheimer’s disease classification and pathogenetic factors identification,” IEEE Transactions on Neural Networks and Learning Systems, 2023.
  38. Y. Ding, H. Feng, Y. Yang, J. Holmes, Z. Liu, D. Liu, W. W. Wong, N. Y. Yu, T. T. Sio, S. E. Schild et al., “Deep-learning based fast and accurate 3d ct deformable image registration in lung cancer,” Medical Physics, 2023.
  39. Y. Ding, Z. Liu, H. Feng, J. Holmes, Y. Yang, N. Yu, T. Sio, S. Schild, B. Li, and W. Liu, “Accurate and efficient deep neural network based deformable image registration method in lung cancer,” in MEDICAL PHYSICS, vol. 49, no. 6.   WILEY 111 RIVER ST, HOBOKEN 07030-5774, NJ USA, 2022, pp. E148–E148.
  40. D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural language processing: State of the art, current trends and challenges,” Multimedia tools and applications, vol. 82, no. 3, pp. 3713–3744, 2023.
  41. H. Li, Y. Jiao, K. Davey, and S.-Z. Qiao, “Data-driven machine learning for understanding surface structures of heterogeneous catalysts,” Angewandte Chemie, vol. 135, no. 9, p. e202216383, 2023.
  42. H. Huang, Y. Feng, C. Shi, L. Xu, J. Yu, and S. Yang, “Free-bloom: Zero-shot text-to-video generator with llm director and ldm animator,” arXiv preprint arXiv:2309.14494, 2023.
  43. B. Liu, Y. Jiang, X. Zhang, Q. Liu, S. Zhang, J. Biswas, and P. Stone, “Llm+p: Empowering large language models with optimal planning proficiency,” 2023.
  44. Y. Wang, W. Zhong, L. Li, F. Mi, X. Zeng, W. Huang, L. Shang, X. Jiang, and Q. Liu, “Aligning large language models with human: A survey,” arXiv preprint arXiv:2307.12966, 2023.
  45. T. Yoneda, J. Fang, P. Li, H. Zhang, T. Jiang, S. Lin, B. Picker, D. Yunis, H. Mei, and M. R. Walter, “Statler: State-maintaining language models for embodied reasoning,” arXiv preprint arXiv:2306.17840, 2023.
  46. J. Liang, W. Huang, F. Xia, P. Xu, K. Hausman, B. Ichter, P. Florence, and A. Zeng, “Code as policies: Language model programs for embodied control,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 9493–9500.
  47. S. Vemprala, R. Bonatti, A. Bucker, and A. Kapoor, “Chatgpt for robotics: Design principles and model abilities,” Microsoft Auton. Syst. Robot. Res, vol. 2, p. 20, 2023.
  48. A. Padalkar, A. Pooley, A. Jain, A. Bewley, A. Herzog, A. Irpan, A. Khazatsky, A. Rai, A. Singh, A. Brohan et al., “Open x-embodiment: Robotic learning datasets and rt-x models,” arXiv preprint arXiv:2310.08864, 2023.
  49. J. Mai, J. Chen, B. Li, G. Qian, M. Elhoseiny, and B. Ghanem, “Llm as a robotic brain: Unifying egocentric memory and control,” arXiv preprint arXiv:2304.09349, 2023.
  50. P. Zhang, X. Li, X. Hu, J. Yang, L. Zhang, L. Wang, Y. Choi, and J. Gao, “Vinvl: Revisiting visual representations in vision-language models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 5579–5588.
  51. K. Zhou, J. Yang, C. C. Loy, and Z. Liu, “Learning to prompt for vision-language models,” International Journal of Computer Vision, vol. 130, no. 9, pp. 2337–2348, 2022.
  52. D. Zhu, J. Chen, X. Shen, X. Li, and M. Elhoseiny, “Minigpt-4: Enhancing vision-language understanding with advanced large language models,” arXiv preprint arXiv:2304.10592, 2023.
  53. Z. Lin, D. Zhang, Q. Tao, D. Shi, G. Haffari, Q. Wu, M. He, and Z. Ge, “Medical visual question answering: A survey,” Artificial Intelligence in Medicine, p. 102611, 2023.
  54. S. Ravi, A. Chinchure, L. Sigal, R. Liao, and V. Shwartz, “Vlc-bert: visual question answering with contextualized commonsense knowledge,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 1155–1165.
  55. Z. Shao, Z. Yu, M. Wang, and J. Yu, “Prompting large language models with answer heuristics for knowledge-based visual question answering,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14 974–14 983.
  56. T. Brooks, A. Holynski, and A. A. Efros, “Instructpix2pix: Learning to follow image editing instructions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 18 392–18 402.
  57. V. Gorokhovatskyi, I. Tvoroshenko, O. Kobylin, and N. Vlasenko, “Search for visual objects by request in the form of a cluster representation for the structural image description,” Advances in Electrical and Electronic Engineering, vol. 21, no. 1, pp. 19–27, 2023.
  58. D. Shah, B. Osiński, S. Levine et al., “Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action,” in Conference on Robot Learning.   PMLR, 2023, pp. 492–504.
  59. C. Huang, O. Mees, A. Zeng, and W. Burgard, “Visual language maps for robot navigation,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 10 608–10 615.
  60. A. Brohan, Y. Chebotar, C. Finn, K. Hausman, A. Herzog, D. Ho, J. Ibarz, A. Irpan, E. Jang, R. Julian et al., “Do as i can, not as i say: Grounding language in robotic affordances,” in Conference on Robot Learning.   PMLR, 2023, pp. 287–318.
  61. A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess, A. Dubey, C. Finn et al., “Rt-2: Vision-language-action models transfer web knowledge to robotic control,” arXiv preprint arXiv:2307.15818, 2023.
  62. Y. Cui, S. Karamcheti, R. Palleti, N. Shivakumar, P. Liang, and D. Sadigh, “No, to the right: Online language corrections for robotic manipulation via shared autonomy,” in Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction, 2023, pp. 93–101.
  63. W. Huang, C. Wang, R. Zhang, Y. Li, J. Wu, and L. Fei-Fei, “Voxposer: Composable 3d value maps for robotic manipulation with language models,” arXiv preprint arXiv:2307.05973, 2023.
  64. W. Huang, F. Xia, T. Xiao, H. Chan, J. Liang, P. Florence, A. Zeng, J. Tompson, I. Mordatch, Y. Chebotar et al., “Inner monologue: Embodied reasoning through planning with language models,” arXiv preprint arXiv:2207.05608, 2022.
  65. Z. Zhao, W. S. Lee, and D. Hsu, “Large language models as commonsense knowledge for large-scale task planning,” arXiv preprint arXiv:2305.14078, 2023.
  66. A. Zeng, M. Attarian, B. Ichter, K. Choromanski, A. Wong, S. Welker, F. Tombari, A. Purohit, M. Ryoo, V. Sindhwani et al., “Socratic models: Composing zero-shot multimodal reasoning with language,” arXiv preprint arXiv:2204.00598, 2022.
  67. D. Driess, F. Xia, M. S. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu et al., “Palm-e: An embodied multimodal language model,” arXiv preprint arXiv:2303.03378, 2023.
  68. J. A. Abdulsaheb and D. J. Kadhim, “Classical and heuristic approaches for mobile robot path planning: A survey,” Robotics, vol. 12, no. 4, p. 93, 2023.
  69. P. A. Jansen, “Visually-grounded planning without vision: Language models infer detailed plans from high-level instructions,” arXiv preprint arXiv:2009.14259, 2020.
  70. W. Huang, F. Xia, D. Shah, D. Driess, A. Zeng, Y. Lu, P. Florence, I. Mordatch, S. Levine, K. Hausman et al., “Grounded decoding: Guiding text generation with grounded models for robot control,” arXiv preprint arXiv:2303.00855, 2023.
  71. K. Lin, C. Agia, T. Migimatsu, M. Pavone, and J. Bohg, “Text2motion: From natural language instructions to feasible plans,” arXiv preprint arXiv:2303.12153, 2023.
  72. C. H. Song, J. Wu, C. Washington, B. M. Sadler, W.-L. Chao, and Y. Su, “Llm-planner: Few-shot grounded planning for embodied agents with large language models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 2998–3009.
  73. K. Rana, J. Haviland, S. Garg, J. Abou-Chakra, I. Reid, and N. Suenderhauf, “Sayplan: Grounding large language models using 3d scene graphs for scalable robot task planning,” in Conference on Robot Learning.   PMLR, 2023, pp. 23–72.
  74. Y. Chang, X. Wang, J. Wang, Y. Wu, K. Zhu, H. Chen, L. Yang, X. Yi, C. Wang, Y. Wang et al., “A survey on evaluation of large language models,” arXiv preprint arXiv:2307.03109, 2023.
  75. S. Li, X. Puig, C. Paxton, Y. Du, C. Wang, L. Fan, T. Chen, D.-A. Huang, E. Akyürek, A. Anandkumar et al., “Pre-trained language models for interactive decision-making,” Advances in Neural Information Processing Systems, vol. 35, pp. 31 199–31 212, 2022.
  76. S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao, “React: Synergizing reasoning and acting in language models,” arXiv preprint arXiv:2210.03629, 2022.
  77. A. Z. Ren, A. Dixit, A. Bodrova, S. Singh, S. Tu, N. Brown, P. Xu, L. Takayama, F. Xia, J. Varley et al., “Robots that ask for help: Uncertainty alignment for large language model planners,” arXiv preprint arXiv:2307.01928, 2023.
  78. N. Rane, “Transformers in industry 4.0, industry 5.0, and society 5.0: Roles and challenges,” 2023.
  79. H. Qiao, Y.-X. Wu, S.-L. Zhong, P.-J. Yin, and J.-H. Chen, “Brain-inspired intelligent robotics: Theoretical analysis and systematic application,” Machine Intelligence Research, vol. 20, no. 1, pp. 1–18, 2023.
  80. H. Zhang, W. Du, J. Shan, Q. Zhou, Y. Du, J. B. Tenenbaum, T. Shu, and C. Gan, “Building cooperative embodied agents modularly with large language models,” arXiv preprint arXiv:2307.02485, 2023.
  81. Y. Ding, X. Zhang, C. Paxton, and S. Zhang, “Task and motion planning with large language models for object rearrangement,” arXiv preprint arXiv:2303.06247, 2023.
  82. N. Di Palo, A. Byravan, L. Hasenclever, M. Wulfmeier, N. Heess, and M. Riedmiller, “Towards a unified agent with foundation models,” in Workshop on Reincarnating Reinforcement Learning at ICLR 2023, 2023.
  83. A. Xie, Y. Lee, P. Abbeel, and S. James, “Language-conditioned path planning,” in Conference on Robot Learning.   PMLR, 2023, pp. 3384–3396.
  84. Y. Jiang, A. Gupta, Z. Zhang, G. Wang, Y. Dou, Y. Chen, L. Fei-Fei, A. Anandkumar, Y. Zhu, and L. Fan, “Vima: General robot manipulation with multimodal prompts,” arXiv, 2022.
  85. Y. Lu, P. Lu, Z. Chen, W. Zhu, X. E. Wang, and W. Y. Wang, “Multimodal procedural planning via dual text-image prompting,” arXiv preprint arXiv:2305.01795, 2023.
  86. A. Tam, N. Rabinowitz, A. Lampinen, N. A. Roy, S. Chan, D. Strouse, J. Wang, A. Banino, and F. Hill, “Semantic exploration from language abstractions and pretrained representations,” Advances in Neural Information Processing Systems, vol. 35, pp. 25 377–25 389, 2022.
  87. S. Nair, A. Rajeswaran, V. Kumar, C. Finn, and A. Gupta, “R3m: A universal visual representation for robot manipulation,” arXiv preprint arXiv:2203.12601, 2022.
  88. Y. J. Ma, W. Liang, V. Som, V. Kumar, A. Zhang, O. Bastani, and D. Jayaraman, “Liv: Language-image representations and rewards for robotic control,” arXiv preprint arXiv:2306.00958, 2023.
  89. W. Yu, N. Gileadi, C. Fu, S. Kirmani, K.-H. Lee, M. G. Arenas, H.-T. L. Chiang, T. Erez, L. Hasenclever, J. Humplik et al., “Language to rewards for robotic skill synthesis,” arXiv preprint arXiv:2306.08647, 2023.
  90. H. Hu and D. Sadigh, “Language instructed reinforcement learning for human-ai coordination,” arXiv preprint arXiv:2304.07297, 2023.
  91. A. Bucker, L. Figueredo, S. Haddadin, A. Kapoor, S. Ma, S. Vemprala, and R. Bonatti, “Latte: Language trajectory transformer,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 7287–7294.
  92. R. Wang, J. Mao, J. Hsu, H. Zhao, J. Wu, and Y. Gao, “Programmatically grounded, compositionally generalizable robotic manipulation,” arXiv preprint arXiv:2304.13826, 2023.
  93. A. Z. Ren, B. Govil, T.-Y. Yang, K. R. Narasimhan, and A. Majumdar, “Leveraging language for accelerated learning of tool manipulation,” in Conference on Robot Learning.   PMLR, 2023, pp. 1531–1541.
  94. B. Chen, F. Xia, B. Ichter, K. Rao, K. Gopalakrishnan, M. S. Ryoo, A. Stone, and D. Kappler, “Open-vocabulary queryable scene representations for real world planning,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 11 509–11 522.
  95. H. Ha, P. Florence, and S. Song, “Scaling up and distilling down: Language-guided robot skill acquisition,” arXiv preprint arXiv:2307.14535, 2023.
  96. A. Gupta, L. Fan, S. Ganguli, and L. Fei-Fei, “Metamorph: Learning universal controllers with transformers,” arXiv preprint arXiv:2203.11931, 2022.
  97. I. Singh, V. Blukis, A. Mousavian, A. Goyal, D. Xu, J. Tremblay, D. Fox, J. Thomason, and A. Garg, “Progprompt: Generating situated robot task plans using large language models,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 11 523–11 530.
  98. L.-H. Lin, Y. Cui, Y. Hao, F. Xia, and D. Sadigh, “Gesture-informed robot assistance via foundation models,” in Conference on Robot Learning.   PMLR, 2023, pp. 3061–3082.
  99. S. Huang, Z. Jiang, H. Dong, Y. Qiao, P. Gao, and H. Li, “Instruct2act: Mapping multi-modality instructions to robotic actions with large language model,” arXiv preprint arXiv:2305.11176, 2023.
  100. T. Silver, V. Hariprasad, R. S. Shuttleworth, N. Kumar, T. Lozano-Pérez, and L. P. Kaelbling, “Pddl planning with pretrained large language models,” in NeurIPS 2022 Foundation Models for Decision Making Workshop, 2022.
  101. Z. Liu, A. Bahety, and S. Song, “Reflect: Summarizing robot experiences for failure explanation and correction,” arXiv preprint arXiv:2306.15724, 2023.
  102. X. Zhao, M. Li, C. Weber, M. B. Hafez, and S. Wermter, “Chat with the environment: Interactive multimodal perception using large language models,” arXiv preprint arXiv:2303.08268, 2023.
  103. J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” arXiv preprint arXiv:2304.03442, 2023.
  104. A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, J. Hsu et al., “Rt-1: Robotics transformer for real-world control at scale,” arXiv preprint arXiv:2212.06817, 2022.
  105. D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly, M. Kalakrishnan, V. Vanhoucke et al., “Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation,” arXiv preprint arXiv:1806.10293, 2018.
  106. H. Walke, K. Black, A. Lee, M. J. Kim, M. Du, C. Zheng, T. Zhao, P. Hansen-Estruch, Q. Vuong, A. He, V. Myers, K. Fang, C. Finn, and S. Levine, “Bridgedata v2: A dataset for robot learning at scale,” in Conference on Robot Learning (CoRL), 2023.
  107. G. Zhou, V. Dean, M. K. Srirama, A. Rajeswaran, J. Pari, K. Hatch, A. Jain, T. Yu, P. Abbeel, L. Pinto, C. Finn, and A. Gupta, “Train offline, test online: A real robot learning benchmark,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023.
  108. E. Jang, A. Irpan, M. Khansari, D. Kappler, F. Ebert, C. Lynch, S. Levine, and C. Finn, “BC-z: Zero-shot task generalization with robotic imitation learning,” in 5th Annual Conference on Robot Learning, 2021. [Online]. Available: https://openreview.net/forum?id=8kbp23tSGYv
  109. L. Y. Chen, S. Adebola, and K. Goldberg, “Berkeley UR5 demonstration dataset,” https://sites.google.com/view/berkeley-ur5/home.
  110. J. Pari, N. M. Shafiullah, S. P. Arunachalam, and L. Pinto, “The surprising effectiveness of representation learning for visual imitation,” arXiv preprint arXiv:2112.01511, 2021.
  111. E. Rosete-Beas, O. Mees, G. Kalweit, J. Boedecker, and W. Burgard, “Latent plans for task agnostic offline reinforcement learning,” in Proceedings of the 6th Conference on Robot Learning (CoRL), 2022.
  112. S. Dass, J. Yapeter, J. Zhang, J. Zhang, K. Pertsch, S. Nikolaidis, and J. J. Lim, “Clvr jaco play dataset,” 2023. [Online]. Available: https://github.com/clvrai/clvr_jaco_play_dataset
  113. N. Wake, A. Kanehira, K. Sasabuchi, J. Takamatsu, and K. Ikeuchi, “Chatgpt empowered long-step robot control in various environments: A case application,” arXiv preprint arXiv:2304.03893, 2023.
  114. Y. Liu, J. Hou, C. Li, and X. Wang, “Intelligent soft robotic grippers for agricultural and food product handling: A brief review with a focus on design and control,” Advanced Intelligent Systems, p. 2300233, 2023.
  115. D. Liu, Z. Li, Z. Wu, and C. Li, “Dt/mars-cyclegan: Improved object detection for mars phenotyping robot,” arXiv preprint arXiv:2310.12787, 2023.
  116. G. Lu, S. Li, G. Mai, J. Sun, D. Zhu, L. Chai, H. Sun, X. Wang, H. Dai, N. Liu et al., “Agi for agriculture,” arXiv preprint arXiv:2304.06136, 2023.
  117. C. Batailler, A. Fernandez, J. Swan, E. Servien, F. S. Haddad, F. Catani, and S. Lustig, “Mako ct-based robotic arm-assisted system is a reliable procedure for total knee arthroplasty: a systematic review,” Knee Surgery, Sports Traumatology, Arthroscopy, vol. 29, pp. 3585–3598, 2021.
  118. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning.   PMLR, 2021, pp. 8748–8763.
  119. Y. Duan, J. Zhou, Z. Wang, Y.-K. Wang, and C.-T. Lin, “Dewave: Discrete eeg waves encoding for brain dynamics to text translation,” arXiv preprint arXiv:2309.14030, 2023.
  120. R. Zhang, S. Lee, M. Hwang, A. Hiranaka, C. Wang, W. Ai, J. J. R. Tan, S. Gupta, Y. Hao, G. Levine et al., “Noir: Neural signal operated intelligent robots for everyday activities,” arXiv preprint arXiv:2311.01454, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (20)
  1. Jiaqi Wang (218 papers)
  2. Zihao Wu (100 papers)
  3. Yiwei Li (107 papers)
  4. Hanqi Jiang (27 papers)
  5. Peng Shu (34 papers)
  6. Enze Shi (13 papers)
  7. Huawen Hu (6 papers)
  8. Chong Ma (28 papers)
  9. Yiheng Liu (24 papers)
  10. Xuhui Wang (22 papers)
  11. Yincheng Yao (1 paper)
  12. Xuan Liu (94 papers)
  13. Huaqin Zhao (16 papers)
  14. Zhengliang Liu (91 papers)
  15. Haixing Dai (39 papers)
  16. Lin Zhao (227 papers)
  17. Bao Ge (17 papers)
  18. Xiang Li (1002 papers)
  19. Tianming Liu (161 papers)
  20. Shu Zhang (286 papers)
Citations (42)