Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cook2LTL: Translating Cooking Recipes to LTL Formulae using Large Language Models (2310.00163v2)

Published 29 Sep 2023 in cs.RO

Abstract: Cooking recipes are challenging to translate to robot plans as they feature rich linguistic complexity, temporally-extended interconnected tasks, and an almost infinite space of possible actions. Our key insight is that combining a source of cooking domain knowledge with a formalism that captures the temporal richness of cooking recipes could enable the extraction of unambiguous, robot-executable plans. In this work, we use Linear Temporal Logic (LTL) as a formal language expressive enough to model the temporal nature of cooking recipes. Leveraging a pretrained LLM, we present Cook2LTL, a system that translates instruction steps from an arbitrary cooking recipe found on the internet to a set of LTL formulae, grounding high-level cooking actions to a set of primitive actions that are executable by a manipulator in a kitchen environment. Cook2LTL makes use of a caching scheme that dynamically builds a queryable action library at runtime. We instantiate Cook2LTL in a realistic simulation environment (AI2-THOR), and evaluate its performance across a series of cooking recipes. We demonstrate that our system significantly decreases LLM API calls (-51%), latency (-59%), and cost (-42%) compared to a baseline that queries the LLM for every newly encountered action at runtime.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. The ”Poetics” of Everyday Life: Grounding Resources and Mechanisms for Artificial Agents. https://cordis.europa.eu/project/id/215843. Accessed: 2023-09-28.
  2. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
  3. Robotic roommates making pancakes. In 2011 11th IEEE-RAS International Conference on Humanoid Robots, pages 529–536. IEEE, 2011.
  4. Grounding language to landmarks in arbitrary outdoor environments. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 208–215. IEEE, 2020.
  5. Interpreting and executing recipes with a cooking robot. In Experimental Robotics: The 13th International Symposium on Experimental Robotics, pages 481–495. Springer, 2013.
  6. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  7. The ycb object and model set: Towards common benchmarks for manipulation research. In 2015 International Conference on Advanced Robotics (ICAR), pages 510–517, 2015.
  8. Nl2tl: Transforming natural languages to temporal logics using large language models. arXiv preprint arXiv:2305.07766, 2023.
  9. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  10. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
  11. Temporal logic motion planning for mobile robots. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 2020–2025, 2005.
  12. Sequence-to-sequence language grounding of non-markovian task specifications. In Robotics: Science and Systems, volume 2018, 2018.
  13. Incorporating copying mechanism in sequence-to-sequence learning. arXiv preprint arXiv:1603.06393, 2016.
  14. Generalizing to new domains by mapping natural language to lifted ltl. In 2022 International Conference on Robotics and Automation (ICRA), pages 3624–3630. IEEE, 2022.
  15. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147. PMLR, 2022a.
  16. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022b.
  17. Do as i can, not as i say: Grounding language in robotic affordances. In Proceedings of the Conference on Robot Learning (CoRL), volume 205, pages 287–318, 2023.
  18. Recipe instruction semantics corpus (RISeC): Resolving semantic structure and zero anaphora in recipes. In Proceedings of the Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, pages 821–826, 2020.
  19. Ai2-THOR: An interactive 3d environment for visual AI. arXiv preprint arXiv:1712.05474, 2017.
  20. Temporal-logic-based reactive mission and motion planning. IEEE transactions on robotics, 25(6):1370–1381, 2009.
  21. Text2motion: From natural language instructions to feasible plans. arXiv preprint arXiv:2303.12153, 2023.
  22. Llm+p: Empowering large language models with optimal planning proficiency, 2023a.
  23. Lang2ltl: Translating natural language commands to temporal robot task specification. arXiv preprint arXiv:2302.11649, 2023b.
  24. Language models of code are few-shot commonsense learners. arXiv preprint arXiv:2210.07128, 2022.
  25. Cooking with semantics. In Proceedings of the ACL 2014 Workshop on Semantic Parsing, pages 33–38, 2014.
  26. Recipe1m+: A dataset for learning cross-modal embeddings for cooking recipes and food images. IEEE Trans. Pattern Anal. Mach. Intell., 2019.
  27. Pddl-the planning domain definition language. 1998.
  28. Specification patterns for robotic missions. IEEE Transactions on Software Engineering, 47(10):2208–2224, 2019.
  29. Moley Robotics. Moley kitchen. URL https://www.moley.com/moley-kitchen/. Accessed: 2023-05-29.
  30. Data-efficient learning of natural language to linear temporal logic translators for robot task specification. arXiv preprint arXiv:2303.08006, 2023.
  31. Learning program representations for food images and cooking recipes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16559–16569, 2022.
  32. K. Pastra and Y. Aloimonos. The minimalist grammar of action. Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1585):103–117, 2012.
  33. Learning to ground language to temporal logical form. In NAACL, 2019.
  34. Grounding language to non-markovian tasks with no supervision of task specifications. In Robotics: Science and Systems, volume 2020, 2020.
  35. A. Pnueli. The temporal logic of programs. In 18th Annual Symposium on Foundations of Computer Science, pages 46–57. ieee, 1977.
  36. Generalized planning in pddl domains with pretrained large language models. arXiv preprint arXiv:2305.11014, 2023.
  37. Progprompt: Generating situated robot task plans using large language models, 2022.
  38. Optimal path planning for surveillance with temporal-logic constraints. The International Journal of Robotics Research, 30(14):1695–1708, 2011.
  39. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615, 2022.
  40. brat: a web-based tool for NLP-assisted text annotation. In Proceedings of the Demonstrations Session at EACL 2012, Avignon, France, April 2012. Association for Computational Linguistics.
  41. Chatgpt for robotics: Design principles and model abilities. Microsoft Auton. Syst. Robot. Res, 2:20, 2023.
  42. Learning a natural-language to ltl executable semantic parser for grounded robotics. In Conference on Robot Learning, pages 1706–1718, 2021.
  43. Demo2code: From summarizing demonstrations to synthesizing code via extended chain-of-thought, 2023.
  44. Tidybot: Personalized robot assistance with large language models. arXiv preprint arXiv:2305.05658, 2023.
  45. Robot learning manipulation action plans by” watching” unconstrained videos from the world wide web. In Proceedings of the AAAI conference on artificial intelligence, volume 29, 2015.
  46. Socratic models: Composing zero-shot multimodal reasoning with language. arXiv preprint arXiv:2204.00598, 2022.
Citations (6)

Summary

  • The paper introduces Cook2LTL, a system that translates complex cooking recipes into Linear Temporal Logic for robot-executable plans.
  • It employs semantic parsing and a dynamic caching library to break down high-level instructions into primitive actions, significantly reducing computational costs.
  • Experimental results in AI2-THOR simulations show reduced latency and API calls, highlighting the potential for scalable domestic robotic applications.

Analyzing Cook2LTL: Translating Cooking Recipes to LTL Formulae using LLMs

The paper presented addresses a complex intersection of natural language processing, robotics, and temporal logic through the development of Cook2LTL, a system designed to translate cooking recipes into Linear Temporal Logic (LTL) formulae. This framework effectively seeks to bridge the gap between human-readable cooking instructions and robot-executable plans by leveraging the capabilities of LLMs.

Problem Context and Core Approach

Cooking recipes provide a rich domain of tasks characterized by linguistic complexity, temporal dependencies, and a wide array of possible actions, posing challenges in direct translation to robot plans. The central challenge tackled in this work is to extract unambiguous, robot-executable tasks from such natural instructions. The authors propose using LTL as a formal language that can adequately express the temporal intricacies of recipes. Cook2LTL harnesses a pretrained LLM to decompose high-level cooking instructions into primitive actions executable by kitchen robots.

System Components

The Cook2LTL architecture consists of several key components:

  1. Semantic Parsing: Semantic parsing is used to convert instructions into parametric function representations. This involves categorizing verbs, objects, actions, and associated parameters such as time and temperature from recipe steps.
  2. Action Reduction: Through LLM prompting, high-level actions that are not directly executable are broken down into sequences of primitive actions. An innovative caching and queryable action library is dynamically built during runtime, enhancing system efficiency by minimizing repeated LLM calls.
  3. LTL Translation: Once actions are translated into primitive sequences, they are further converted into LTL formulae. This aids in creating executable sequences that precisely capture the inherent order and dependencies of actions outlined in cooking recipes.

Experimental Evaluation

The empirical evaluation of Cook2LTL is conducted through simulations in the AI2-THOR environment, focusing on a subset of recipes from the Recipe1M+ corpus. The performance of Cook2LTL is validated against variants lacking components such as caching. The results illustrate substantial improvements: a decrease in LLM API calls by 51%, latency by 59%, and overall costs by 42%. These metrics underscore the system's enhanced efficiency when incorporating the dynamic library for caching previously reduced actions.

Implications and Future Directions

The practical implications of Cook2LTL are noteworthy, as it offers a scalable approach for deploying robotic assistants in domestic environments where understanding and executing complex tasks like cooking are crucial. Theoretically, this work contributes to the expanding field of integrating LLMs with formal logic structures to automate multi-step processes with temporal dependencies.

However, the system is not without limitations. The reliance on simulated environments limits real-world applicability, noting that real-life robotics might encounter unforeseen variables absent in controlled simulations. The development of more extensive annotated datasets for refining semantic parsing accuracy is needed to bolster performance. Similarly, improving the robustness of action reduction plans remains critical to ensure fault-tolerant operations.

In conclusion, Cook2LTL presents a significant advancement in the domain of robotic planning from natural language instructions, highlighting the potential of combining domain-specific knowledge with powerful LLMs. Future research could extend this approach to a broader range of tasks and real-world applications, further enhancing the integration of AI in everyday human activities.

Youtube Logo Streamline Icon: https://streamlinehq.com