Real-World Cooking Robot System from Recipes Based on Food State Recognition Using Foundation Models and PDDL (2410.02874v2)

Published 3 Oct 2024 in cs.RO and cs.AI

Abstract: Although there is a growing demand for cooking behaviours as one of the expected tasks for robots, a series of cooking behaviours based on new recipe descriptions by robots in the real world has not yet been realised. In this study, we propose a robot system that integrates real-world executable robot cooking behaviour planning using the LLM and classical planning of PDDL descriptions, and food ingredient state recognition learning from a small number of data using the Vision-LLM (VLM). We succeeded in experiments in which PR2, a dual-armed wheeled robot, performed cooking from arranged new recipes in a real-world environment, and confirmed the effectiveness of the proposed system.

Summary

The paper introduces an innovative system that uses LLMs and PDDL to convert recipes into executable robotic actions.
It employs VLMs with few-shot learning to accurately recognize real-time food state changes during cooking.
Successful trials with dishes like sunny-side-up eggs and sautéed broccoli validate the system’s robustness and adaptability.

Overview of the Real-World Cooking Robot System from Recipes

The paper presents a novel robot system that integrates LLMs, classical Planning Domain Definition Language (PDDL) planning, and Vision-LLMs (VLMs) to execute cooking behaviors based on new recipes in real-world settings. The approach addresses two critical issues: executing real-world cooking actions derived from recipe descriptions and recognizing changes in the state of ingredients during cooking.

Methodology

The methodology comprises three key components:

Real-World Executable Action Planning:
- The paper employs LLMs to transform natural language recipe descriptions into a sequence of cooking functions. This transformation allows the robot to understand and execute tasks at a procedural level.
- Subsequently, the use of PDDL facilitates classical planning to augment these sequences with necessary actions unmentioned in recipes. This ensures the robot can perform tasks considering real-world constraints and missing procedural steps, such as positioning or tool handling.
Foodstuff State Change Recognition:
- Recognizing the dynamic states of ingredients is crucial for cooking. The system uses VLMs to learn state change recognition from limited data. By leveraging few-shot learning, the robot can make real-time state assessments, such as determining when water reaches boiling or detecting the cooking status of various ingredients.
Integration and Execution:
- The integrated system enables a robot to autonomously execute recipes in a real-world kitchen setup, exemplified by tasks such as making sunny-side-up eggs and sautéing broccoli. Experiments demonstrated that the robot could complete these tasks by employing predefined motion trajectories for action execution.

Key Results and Implications

The experiments reveal several impactful findings:

The integration of LLMs and PDDL planning enables sophisticated and nuanced action planning, ensuring the robot's ability to adapt task execution to real environmental conditions.
The paper shows robust performance in ingredient state change recognition utilizing VLMs, even with small training datasets. The experimental outcomes suggest that increasing training data volume can enhance recognition robustness.
Real-world robotic cooking trials demonstrated successful execution of unknown recipes, confirming the system’s functional reliability.

Theoretical and Practical Implications

This research holds significant implications for both theoretical advancement and practical implementation in autonomous robotic systems. Theoretically, it underscores the efficacy of combining LLMs with classical planning techniques for complex task execution. Practically, the system could facilitate the deployment of robotic assistants in culinary environments, potentially reducing human workload and enhancing consistency in food preparation.

Future Developments

Looking ahead, the research could be expanded by:

Incorporating Advanced Motion Planning: Implementing adaptive motion execution could allow robots to handle more complex and varied cooking tasks beyond current capabilities.
Extending Recipe and Ingredient Coverage: Broadening the range of recipes and ingredients manageable by the system could increase its applicability in diverse culinary contexts.
Enhancing Complexity Handling: Addressing the challenges associated with more complex recipes and simultaneous task execution could optimize system performance.

In summary, the presented system exemplifies a successful synthesis of foundational AI models towards achieving real-world autonomous cooking, offering a promising outlook for the future of intelligent robotics in daily life applications.

PDF Markdown

Related Papers

Tweets

https://twitter.com/Naoaki_Kanazawa/status/1843260172124483790

https://twitter.com/Naoaki_Kanazawa/status/1843260174443905532