FLAIR: Feeding via Long-horizon AcquIsition of Realistic dishes (2407.07561v1)

Published 10 Jul 2024 in cs.RO and cs.AI

Abstract: Robot-assisted feeding has the potential to improve the quality of life for individuals with mobility limitations who are unable to feed themselves independently. However, there exists a large gap between the homogeneous, curated plates existing feeding systems can handle, and truly in-the-wild meals. Feeding realistic plates is immensely challenging due to the sheer range of food items that a robot may encounter, each requiring specialized manipulation strategies which must be sequenced over a long horizon to feed an entire meal. An assistive feeding system should not only be able to sequence different strategies efficiently in order to feed an entire meal, but also be mindful of user preferences given the personalized nature of the task. We address this with FLAIR, a system for long-horizon feeding which leverages the commonsense and few-shot reasoning capabilities of foundation models, along with a library of parameterized skills, to plan and execute user-preferred and efficient bite sequences. In real-world evaluations across 6 realistic plates, we find that FLAIR can effectively tap into a varied library of skills for efficient food pickup, while adhering to the diverse preferences of 42 participants without mobility limitations as evaluated in a user study. We demonstrate the seamless integration of FLAIR with existing bite transfer methods [19, 28], and deploy it across 2 institutions and 3 robots, illustrating its adaptability. Finally, we illustrate the real-world efficacy of our system by successfully feeding a care recipient with severe mobility limitations. Supplementary materials and videos can be found at: https://emprise.cs.cornell.edu/flair .

Citations (5)

View on Semantic Scholar

Summary

The paper introduces a long-horizon bite acquisition system that integrates foundation models to plan efficient, user-preferred feeding actions.
It employs a library of parameterized food manipulation skills combined with hierarchical task planning to manage diverse meal compositions across various robotic setups.
Empirical validations, including user studies and real-world deployments, demonstrate significant improvements in feeding efficiency and adherence to user preferences.

FLAIR: Feeding via Long-Horizon Acquisition of Realistic Dishes

The paper "FLAIR: Feeding via Long-horizon AcquIsition of Realistic dishes" introduces an advanced system for assisting individuals with mobility limitations in the process of eating. This paper is a significant contribution to the ongoing research in robot-assisted feeding due to its attempt to bridge the gap between existing homogeneous, curated plates and the diverse, realistic meals encountered in everyday life.

Overview

FLAIR leverages the reasoning capabilities of foundation models, such as Vision-LLMs (VLMs) and LLMs, integrated with a library of parameterized food manipulation skills to plan and execute efficient, user-preferred sequences of actions for meal consumption. The system is evaluated under various conditions to ensure its adaptability and effectiveness, demonstrating promising results in terms of efficiency and user satisfaction.

Technical Contributions

Hardware System

The authors deploy FLAIR across different institutional setups using multiple robotic embodiments, including the Kinova Gen3 and Franka Emika Panda robots. Each robot is equipped with a custom-designed, motorized feeding utensil that facilitates dynamic movements such as twirling and scooping, enhancing the dexterity required for manipulating a wide range of food items.

Long-Horizon Bite Acquisition Framework

The core of FLAIR is its ability to perform long-horizon bite acquisitions. This involves:

State Representation: Using GPT-4V for food item recognition and GroundingDINO for bounding box detection. These models provide high-level semantic labels and detailed segmentation masks of the food items present on a plate.
Skill Library: A comprehensive set of pre-acquisition and acquisition skills tailored to handle different food textures and types, such as twirling noodles, skewering meat, scooping semisolids, and dipping items in sauces. These skills are parameterized based on the visual state estimates obtained from the food detection step.

Task Planning for Acquisition

The hierarchical task planner (denoted as $\mathcal{T}$ ) is central to FLAIR's operation. It uses vision-based post-processing steps to quantify food item distribution and determine the sequence of pre-acquisition actions (e.g., grouping, pushing) and direct acquisition actions required for each food item category. This robust and versatile approach allows the system to adapt to varied meal compositions.

Bite Sequencing via Foundation Models

To plan bite sequences that balance efficiency and user preferences, the system employs an LLM, specifically GPT-4V. The model processes context, including user preferences, history of bites, and the estimated efficiency of acquiring each food item, to output a bite sequence that adheres to both preference and efficiency criteria.

Integration of Acquisition and Transfer

FLAIR's modular architecture facilitates seamless integration with existing bite transfer methods. The system adapts to both outside-mouth and inside-mouth transfer frameworks, ensuring safe and efficient food delivery to the user's mouth.

Empirical Validation

The authors validate FLAIR through extensive experiments that include:

User Studies: Conducted across 42 individuals without mobility limitations, the studies reveal that FLAIR effectively respects user preferences and achieves efficient bite sequences. The system's adherence to user preferences significantly exceeds that of baseline approaches, including efficiency-only and preference-only strategies.
Task Planning Comparison: Compared against baselines such as VAPORS, VLM-TaskPlanner, and Swin-Transformer across datasets, FLAIR's hierarchical task planner demonstrates superior performance in planning accurate skill sequences.
Real-World Deployment: The system is successfully deployed to feed a care recipient with severe mobility limitations, highlighting its practical utility and robustness.

Implications

The implications of this research are both practical and theoretical:

Practical Implications: FLAIR can substantially improve the quality of life for individuals with mobility impairments by providing autonomous meal assistance, thus reducing caregiver workload and enhancing the user's dining experience.
Theoretical Implications: The integration of foundation models with parameterized skills in a long-horizon planning framework opens new avenues for research in assistive robotics, emphasizing the importance of combining high-level reasoning with low-level skill execution.

Future Developments

Future research could address current limitations, such as improving the robustness of the food perception module to reduce errors and expanding the skill library to include more reactive and adaptive manipulation strategies. Additionally, structured prompting strategies and real-time user feedback mechanisms could further enhance the performance and reliability of the bite sequencing component.

In conclusion, FLAIR represents a significant advancement in the domain of robot-assisted feeding, showcasing the potential of integrating foundation models with diverse skill sets to achieve efficient and user-preferred meal assistance.

Related Papers

Tweets

https://twitter.com/priyasun_/status/1812167886158385419

https://twitter.com/taziku_co/status/1812629322790027398

https://twitter.com/osanpochuudayo/status/1812443478267506863