Integrating Action Knowledge and LLMs for Task Planning and Situation Handling in Open Worlds (2305.17590v2)
Abstract: Task planning systems have been developed to help robots use human knowledge (about actions) to complete long-horizon tasks. Most of them have been developed for "closed worlds" while assuming the robot is provided with complete world knowledge. However, the real world is generally open, and the robots frequently encounter unforeseen situations that can potentially break the planner's completeness. Could we leverage the recent advances on pre-trained LLMs to enable classical planning systems to deal with novel situations? This paper introduces a novel framework, called COWP, for open-world task planning and situation handling. COWP dynamically augments the robot's action knowledge, including the preconditions and effects of actions, with task-oriented commonsense knowledge. COWP embraces the openness from LLMs, and is grounded to specific domains via action knowledge. For systematic evaluations, we collected a dataset that includes 1,085 execution-time situations. Each situation corresponds to a state instance wherein a robot is potentially unable to complete a task using a solution that normally works. Experimental results show that our approach outperforms competitive baselines from the literature in the success rate of service tasks. Additionally, we have demonstrated COWP using a mobile manipulator. Supplementary materials are available at: https://cowplanning.github.io/
- Automated planning and acting. Cambridge University Press; 2016. Reiter [1981] Reiter R. On closed world data bases. In: Readings in artificial intelligence. Elsevier; 1981. p. 119–140. Knoblock et al. [1991] Knoblock CA, Tenenberg JD, Yang Q. Characterizing abstraction hierarchies for planning. In: Proceedings of the ninth National conference on Artificial intelligence-Volume 2; 1991. p. 692–697. Hoffmann [2001] Hoffmann J. FF: The fast-forward planning system. AI magazine. 2001;22(3):57–57. Nau et al. [2003] Nau DS, Au TC, Ilghami O, Kuter U, Murdock JW, Wu D, et al. SHOP2: An HTN planning system. Journal of artificial intelligence research. 2003;20:379–404. Helmert [2006] Helmert M. The fast downward planning system. Journal of Artificial Intelligence Research. 2006;26:191–246. Hanheide et al. [2017] Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Reiter R. On closed world data bases. In: Readings in artificial intelligence. Elsevier; 1981. p. 119–140. Knoblock et al. [1991] Knoblock CA, Tenenberg JD, Yang Q. Characterizing abstraction hierarchies for planning. In: Proceedings of the ninth National conference on Artificial intelligence-Volume 2; 1991. p. 692–697. Hoffmann [2001] Hoffmann J. FF: The fast-forward planning system. AI magazine. 2001;22(3):57–57. Nau et al. [2003] Nau DS, Au TC, Ilghami O, Kuter U, Murdock JW, Wu D, et al. SHOP2: An HTN planning system. Journal of artificial intelligence research. 2003;20:379–404. Helmert [2006] Helmert M. The fast downward planning system. Journal of Artificial Intelligence Research. 2006;26:191–246. Hanheide et al. [2017] Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Knoblock CA, Tenenberg JD, Yang Q. Characterizing abstraction hierarchies for planning. In: Proceedings of the ninth National conference on Artificial intelligence-Volume 2; 1991. p. 692–697. Hoffmann [2001] Hoffmann J. FF: The fast-forward planning system. AI magazine. 2001;22(3):57–57. Nau et al. [2003] Nau DS, Au TC, Ilghami O, Kuter U, Murdock JW, Wu D, et al. SHOP2: An HTN planning system. Journal of artificial intelligence research. 2003;20:379–404. Helmert [2006] Helmert M. The fast downward planning system. Journal of Artificial Intelligence Research. 2006;26:191–246. Hanheide et al. [2017] Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Hoffmann J. FF: The fast-forward planning system. AI magazine. 2001;22(3):57–57. Nau et al. [2003] Nau DS, Au TC, Ilghami O, Kuter U, Murdock JW, Wu D, et al. SHOP2: An HTN planning system. Journal of artificial intelligence research. 2003;20:379–404. Helmert [2006] Helmert M. The fast downward planning system. Journal of Artificial Intelligence Research. 2006;26:191–246. Hanheide et al. [2017] Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Nau DS, Au TC, Ilghami O, Kuter U, Murdock JW, Wu D, et al. SHOP2: An HTN planning system. Journal of artificial intelligence research. 2003;20:379–404. Helmert [2006] Helmert M. The fast downward planning system. Journal of Artificial Intelligence Research. 2006;26:191–246. Hanheide et al. [2017] Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Helmert M. The fast downward planning system. Journal of Artificial Intelligence Research. 2006;26:191–246. Hanheide et al. [2017] Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Reiter R. On closed world data bases. In: Readings in artificial intelligence. Elsevier; 1981. p. 119–140. Knoblock et al. [1991] Knoblock CA, Tenenberg JD, Yang Q. Characterizing abstraction hierarchies for planning. In: Proceedings of the ninth National conference on Artificial intelligence-Volume 2; 1991. p. 692–697. Hoffmann [2001] Hoffmann J. FF: The fast-forward planning system. AI magazine. 2001;22(3):57–57. Nau et al. [2003] Nau DS, Au TC, Ilghami O, Kuter U, Murdock JW, Wu D, et al. SHOP2: An HTN planning system. Journal of artificial intelligence research. 2003;20:379–404. Helmert [2006] Helmert M. The fast downward planning system. Journal of Artificial Intelligence Research. 2006;26:191–246. Hanheide et al. [2017] Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Knoblock CA, Tenenberg JD, Yang Q. Characterizing abstraction hierarchies for planning. In: Proceedings of the ninth National conference on Artificial intelligence-Volume 2; 1991. p. 692–697. Hoffmann [2001] Hoffmann J. FF: The fast-forward planning system. AI magazine. 2001;22(3):57–57. Nau et al. [2003] Nau DS, Au TC, Ilghami O, Kuter U, Murdock JW, Wu D, et al. SHOP2: An HTN planning system. Journal of artificial intelligence research. 2003;20:379–404. Helmert [2006] Helmert M. The fast downward planning system. Journal of Artificial Intelligence Research. 2006;26:191–246. Hanheide et al. [2017] Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Hoffmann J. FF: The fast-forward planning system. AI magazine. 2001;22(3):57–57. Nau et al. [2003] Nau DS, Au TC, Ilghami O, Kuter U, Murdock JW, Wu D, et al. SHOP2: An HTN planning system. Journal of artificial intelligence research. 2003;20:379–404. Helmert [2006] Helmert M. The fast downward planning system. Journal of Artificial Intelligence Research. 2006;26:191–246. Hanheide et al. [2017] Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Nau DS, Au TC, Ilghami O, Kuter U, Murdock JW, Wu D, et al. SHOP2: An HTN planning system. Journal of artificial intelligence research. 2003;20:379–404. Helmert [2006] Helmert M. The fast downward planning system. Journal of Artificial Intelligence Research. 2006;26:191–246. Hanheide et al. [2017] Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Helmert M. The fast downward planning system. Journal of Artificial Intelligence Research. 2006;26:191–246. Hanheide et al. [2017] Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Characterizing abstraction hierarchies for planning. In: Proceedings of the ninth National conference on Artificial intelligence-Volume 2; 1991. p. 692–697. Hoffmann [2001] Hoffmann J. FF: The fast-forward planning system. AI magazine. 2001;22(3):57–57. Nau et al. [2003] Nau DS, Au TC, Ilghami O, Kuter U, Murdock JW, Wu D, et al. SHOP2: An HTN planning system. Journal of artificial intelligence research. 2003;20:379–404. Helmert [2006] Helmert M. The fast downward planning system. Journal of Artificial Intelligence Research. 2006;26:191–246. Hanheide et al. [2017] Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Hoffmann J. FF: The fast-forward planning system. AI magazine. 2001;22(3):57–57. Nau et al. [2003] Nau DS, Au TC, Ilghami O, Kuter U, Murdock JW, Wu D, et al. SHOP2: An HTN planning system. Journal of artificial intelligence research. 2003;20:379–404. Helmert [2006] Helmert M. The fast downward planning system. Journal of Artificial Intelligence Research. 2006;26:191–246. Hanheide et al. [2017] Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Nau DS, Au TC, Ilghami O, Kuter U, Murdock JW, Wu D, et al. SHOP2: An HTN planning system. Journal of artificial intelligence research. 2003;20:379–404. Helmert [2006] Helmert M. The fast downward planning system. Journal of Artificial Intelligence Research. 2006;26:191–246. Hanheide et al. [2017] Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Helmert M. The fast downward planning system. Journal of Artificial Intelligence Research. 2006;26:191–246. Hanheide et al. [2017] Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Hoffmann J. FF: The fast-forward planning system. AI magazine. 2001;22(3):57–57. Nau et al. [2003] Nau DS, Au TC, Ilghami O, Kuter U, Murdock JW, Wu D, et al. SHOP2: An HTN planning system. Journal of artificial intelligence research. 2003;20:379–404. Helmert [2006] Helmert M. The fast downward planning system. Journal of Artificial Intelligence Research. 2006;26:191–246. Hanheide et al. [2017] Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Nau DS, Au TC, Ilghami O, Kuter U, Murdock JW, Wu D, et al. SHOP2: An HTN planning system. Journal of artificial intelligence research. 2003;20:379–404. Helmert [2006] Helmert M. The fast downward planning system. Journal of Artificial Intelligence Research. 2006;26:191–246. Hanheide et al. [2017] Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Helmert M. The fast downward planning system. Journal of Artificial Intelligence Research. 2006;26:191–246. Hanheide et al. [2017] Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- SHOP2: An HTN planning system. Journal of artificial intelligence research. 2003;20:379–404. Helmert [2006] Helmert M. The fast downward planning system. Journal of Artificial Intelligence Research. 2006;26:191–246. Hanheide et al. [2017] Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Helmert M. The fast downward planning system. Journal of Artificial Intelligence Research. 2006;26:191–246. Hanheide et al. [2017] Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Helmert M. The fast downward planning system. Journal of Artificial Intelligence Research. 2006;26:191–246. Hanheide et al. [2017] Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Hanheide M, Göbelbecker M, Horn GS, Pronobis A, Sjöö K, Aydemir A, et al. Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence. 2017;247:119–150. Jiang et al. [2019] Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Y, Walker N, Hart J, Stone P. Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Open-world reasoning for service robots. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 29; 2019. p. 725–733. Chernova et al. [2020] Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chernova S, Chu V, Daruna A, Garrison H, Hahn M, Khante P, et al. Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics Research. Springer; 2020. p. 353–369. Kant et al. [2022] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, et al. Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Housekeep: Tidying virtual households using commonsense reasoning. In: Computer Vision–ECCV 2022. Springer; 2022. p. 355–373. Huang et al. [2022] Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Abbeel P, Pathak D, Mordatch I. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. Thirty-ninth International Conference on Machine Learning. 2022;. Brohan et al. [2023] Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, Ho D, et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on Robot Learning; 2023. p. 287–318. Perera et al. [2015] Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Perera V, Soetens R, Kollar T, Samadi M, Sun Y, Nardi D, et al. Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Learning task knowledge from dialog and web access. Robotics. 2015;4(2):223–252. Amiri et al. [2019] Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Amiri S, Bajracharya S, Goktolgal C, Thomason J, Zhang S. Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2019. p. 744–750. Tucker et al. [2020] Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Tucker M, Aksaray D, Paul R, Stein GJ, Roy N. Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Learning unknown groundings for natural language interaction with mobile robots. In: Robotics Research. Springer; 2020. p. 317–333. Brown et al. [2020] Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–1901. Zhang et al. [2022] Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang S, Roller S, Goyal N, Artetxe M, Chen M, Chen S, et al. OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:220501068. 2022;. OpenAI [2023] OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Available from: https://openai.com/blog/chatgpt/. Google [2023] Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Google.: Bard FAQ. Accessed on April 7, 2023. https://bard.google.com/faq. Elsweiler et al. [2022] Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Elsweiler D, Hauptmann H, Trattner C. Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Food recommender systems. In: Recommender Systems Handbook. Springer; 2022. p. 871–925. Davis and Marcus [2015] Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Davis E, Marcus G. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM. 2015;58(9):92–103. Huang et al. [2023] Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Shah D, Driess D, Zeng A, Lu Y, et al. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. arXiv preprint arXiv:230300855. 2023;. Puig et al. [2018] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 8494–8502. Singh et al. [2023] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, et al. Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA). 2023;. Haslum et al. [2019] Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Haslum P, Lipovetzky N, Magazzeni D, Muise C. An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning. 2019;13(2):1–187. Jiang et al. [2019] Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jiang Yq, Zhang Sq, Khandelwal P, Stone P. Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Task planning in robotics: an empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering. 2019;20(3):363–373. Galindo et al. [2008] Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Galindo C, Fernández-Madrigal JA, González J, Saffiotti A. Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Robot task planning using semantic maps. Robotics and autonomous systems. 2008;56(11):955–966. Valmeekam et al. [2022] Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:220610498. 2022;. Valmeekam et al. [2023] Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Valmeekam K, Sreedharan S, Marquez M, Olmo A, Kambhampati S. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv preprint arXiv:230206706. 2023;. OpenAI [2023] OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: GPT-4 Technical Report. Liu et al. [2023] Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Liu B, Jiang Y, Zhang X, Liu Q, Zhang S, Biswas J, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:230411477. 2023;. Devlin et al. [2018] Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018;. Chen et al. [2021] Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021;. Liu et al. [2023] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35. Wang et al. [2021] Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Wang C, Liu P, Zhang Y. Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021. p. 3241–3251. Li et al. [2003] Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Li S, Puig X, Paxton C, Du Y, Wang C, Fan L, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems. 2003;. West et al. [2022] West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. West P, Bhagavatula C, Hessel J, Hwang JD, Jiang L, Bras RL, et al. Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Symbolic knowledge distillation: from general language models to commonsense models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022;. Huang et al. [2022] Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Huang W, Xia F, Xiao T, Chan H, Liang J, Florence P, et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual Conference on Robot Learning; 2022. . Ding et al. [2023] Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Ding Y, Zhang X, Paxton C, Zhang S. Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Task and Motion Planning with Large Language Models for Object Rearrangement. arXiv preprint arXiv:230306247. 2023;. Xie et al. [2023] Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Xie Y, Yu C, Zhu T, Bai J, Gong Z, Soh H. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Translating natural language to planning goals with large-language models. arXiv preprint arXiv:230205128. 2023;. Song et al. [2022] Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Song CH, Wu J, Washington C, Sadler BM, Chao WL, Su Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:221204088. 2022;. Lin et al. [2023] Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lin K, Agia C, Migimatsu T, Pavone M, Bohg J. Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:230312153. 2023;. Aeronautiques et al. [1998] Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Aeronautiques C, Howe A, Knoblock C, McDermott ID, Ram A, Veloso M, et al. PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- PDDL— The Planning Domain Definition Language. Technical Report, Tech Rep. 1998;. Lo et al. [2020] Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Lo SY, Zhang S, Stone P. The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research. 2020;69:471–500. Garrett et al. [2021] Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Chitnis R, Holladay R, Kim B, Silver T, Kaelbling LP, et al. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Integrated task and motion planning. Annual review of control, robotics, and autonomous systems. 2021;4:265–293. Garrett et al. [2020] Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Garrett CR, Lozano-Pérez T, Kaelbling LP. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling. vol. 30; 2020. p. 440–448. [47] OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- OpenAI.: Models - OpenAI API. Accessed: 2023-07-10. https://platform.openai.com/docs/models/overview. Morrison et al. [2018] Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Morrison D, Corke P, Leitner J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:180405172. 2018;. Quigley et al. [2009] Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; 2009. p. 5. Jocher et al. [2022] Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Jocher G, Chaurasia A, Stoken A, Borovec J, Kwon Y, Michael K, et al. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo. 2022;. Zhang et al. [2021] Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang N, Li L, Chen X, Deng S, Bi Z, Tan C, et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In: International Conference on Learning Representations; 2021. . Zhang et al. [2023] Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhang X, Ding Y, Amiri S, Yang H, Kaminski A, Esselink C, et al. Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Grounding Classical Task Planners via Vision-Language Models. arXiv preprint arXiv:230408587. 2023;. Zhu et al. [2023] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Zhu D, Chen J, Shen X, Li X, Elhoseiny M. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592. 2023;. Gao et al. [2023] Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;. Gao P, Han J, Zhang R, Lin Z, Geng S, Zhou A, et al. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:230415010. 2023;.
- Yan Ding (40 papers)
- Xiaohan Zhang (78 papers)
- Saeid Amiri (14 papers)
- Nieqing Cao (6 papers)
- Hao Yang (328 papers)
- Andy Kaminski (3 papers)
- Chad Esselink (5 papers)
- Shiqi Zhang (88 papers)