Cook2LTL: Translating Cooking Recipes to LTL Formulae using Large Language Models (2310.00163v2)
Abstract: Cooking recipes are challenging to translate to robot plans as they feature rich linguistic complexity, temporally-extended interconnected tasks, and an almost infinite space of possible actions. Our key insight is that combining a source of cooking domain knowledge with a formalism that captures the temporal richness of cooking recipes could enable the extraction of unambiguous, robot-executable plans. In this work, we use Linear Temporal Logic (LTL) as a formal language expressive enough to model the temporal nature of cooking recipes. Leveraging a pretrained LLM, we present Cook2LTL, a system that translates instruction steps from an arbitrary cooking recipe found on the internet to a set of LTL formulae, grounding high-level cooking actions to a set of primitive actions that are executable by a manipulator in a kitchen environment. Cook2LTL makes use of a caching scheme that dynamically builds a queryable action library at runtime. We instantiate Cook2LTL in a realistic simulation environment (AI2-THOR), and evaluate its performance across a series of cooking recipes. We demonstrate that our system significantly decreases LLM API calls (-51%), latency (-59%), and cost (-42%) compared to a baseline that queries the LLM for every newly encountered action at runtime.
- The ”Poetics” of Everyday Life: Grounding Resources and Mechanisms for Artificial Agents. https://cordis.europa.eu/project/id/215843. Accessed: 2023-09-28.
- Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
- Robotic roommates making pancakes. In 2011 11th IEEE-RAS International Conference on Humanoid Robots, pages 529–536. IEEE, 2011.
- Grounding language to landmarks in arbitrary outdoor environments. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 208–215. IEEE, 2020.
- Interpreting and executing recipes with a cooking robot. In Experimental Robotics: The 13th International Symposium on Experimental Robotics, pages 481–495. Springer, 2013.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- The ycb object and model set: Towards common benchmarks for manipulation research. In 2015 International Conference on Advanced Robotics (ICAR), pages 510–517, 2015.
- Nl2tl: Transforming natural languages to temporal logics using large language models. arXiv preprint arXiv:2305.07766, 2023.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
- Temporal logic motion planning for mobile robots. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 2020–2025, 2005.
- Sequence-to-sequence language grounding of non-markovian task specifications. In Robotics: Science and Systems, volume 2018, 2018.
- Incorporating copying mechanism in sequence-to-sequence learning. arXiv preprint arXiv:1603.06393, 2016.
- Generalizing to new domains by mapping natural language to lifted ltl. In 2022 International Conference on Robotics and Automation (ICRA), pages 3624–3630. IEEE, 2022.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147. PMLR, 2022a.
- Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022b.
- Do as i can, not as i say: Grounding language in robotic affordances. In Proceedings of the Conference on Robot Learning (CoRL), volume 205, pages 287–318, 2023.
- Recipe instruction semantics corpus (RISeC): Resolving semantic structure and zero anaphora in recipes. In Proceedings of the Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, pages 821–826, 2020.
- Ai2-THOR: An interactive 3d environment for visual AI. arXiv preprint arXiv:1712.05474, 2017.
- Temporal-logic-based reactive mission and motion planning. IEEE transactions on robotics, 25(6):1370–1381, 2009.
- Text2motion: From natural language instructions to feasible plans. arXiv preprint arXiv:2303.12153, 2023.
- Llm+p: Empowering large language models with optimal planning proficiency, 2023a.
- Lang2ltl: Translating natural language commands to temporal robot task specification. arXiv preprint arXiv:2302.11649, 2023b.
- Language models of code are few-shot commonsense learners. arXiv preprint arXiv:2210.07128, 2022.
- Cooking with semantics. In Proceedings of the ACL 2014 Workshop on Semantic Parsing, pages 33–38, 2014.
- Recipe1m+: A dataset for learning cross-modal embeddings for cooking recipes and food images. IEEE Trans. Pattern Anal. Mach. Intell., 2019.
- Pddl-the planning domain definition language. 1998.
- Specification patterns for robotic missions. IEEE Transactions on Software Engineering, 47(10):2208–2224, 2019.
- Moley Robotics. Moley kitchen. URL https://www.moley.com/moley-kitchen/. Accessed: 2023-05-29.
- Data-efficient learning of natural language to linear temporal logic translators for robot task specification. arXiv preprint arXiv:2303.08006, 2023.
- Learning program representations for food images and cooking recipes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16559–16569, 2022.
- K. Pastra and Y. Aloimonos. The minimalist grammar of action. Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1585):103–117, 2012.
- Learning to ground language to temporal logical form. In NAACL, 2019.
- Grounding language to non-markovian tasks with no supervision of task specifications. In Robotics: Science and Systems, volume 2020, 2020.
- A. Pnueli. The temporal logic of programs. In 18th Annual Symposium on Foundations of Computer Science, pages 46–57. ieee, 1977.
- Generalized planning in pddl domains with pretrained large language models. arXiv preprint arXiv:2305.11014, 2023.
- Progprompt: Generating situated robot task plans using large language models, 2022.
- Optimal path planning for surveillance with temporal-logic constraints. The International Journal of Robotics Research, 30(14):1695–1708, 2011.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615, 2022.
- brat: a web-based tool for NLP-assisted text annotation. In Proceedings of the Demonstrations Session at EACL 2012, Avignon, France, April 2012. Association for Computational Linguistics.
- Chatgpt for robotics: Design principles and model abilities. Microsoft Auton. Syst. Robot. Res, 2:20, 2023.
- Learning a natural-language to ltl executable semantic parser for grounded robotics. In Conference on Robot Learning, pages 1706–1718, 2021.
- Demo2code: From summarizing demonstrations to synthesizing code via extended chain-of-thought, 2023.
- Tidybot: Personalized robot assistance with large language models. arXiv preprint arXiv:2305.05658, 2023.
- Robot learning manipulation action plans by” watching” unconstrained videos from the world wide web. In Proceedings of the AAAI conference on artificial intelligence, volume 29, 2015.
- Socratic models: Composing zero-shot multimodal reasoning with language. arXiv preprint arXiv:2204.00598, 2022.