LAVA: Long-horizon Visual Action based Food Acquisition (2403.12876v1)

Published 19 Mar 2024 in cs.RO and cs.HC

Abstract: Robotic Assisted Feeding (RAF) addresses the fundamental need for individuals with mobility impairments to regain autonomy in feeding themselves. The goal of RAF is to use a robot arm to acquire and transfer food to individuals from the table. Existing RAF methods primarily focus on solid foods, leaving a gap in manipulation strategies for semi-solid and deformable foods. This study introduces Long-horizon Visual Action (LAVA) based food acquisition of liquid, semisolid, and deformable foods. Long-horizon refers to the goal of "clearing the bowl" by sequentially acquiring the food from the bowl. LAVA employs a hierarchical policy for long-horizon food acquisition tasks. The framework uses high-level policy to determine primitives by leveraging ScoopNet. At the mid-level, LAVA finds parameters for primitives using vision. To carry out sequential plans in the real world, LAVA delegates action execution which is driven by Low-level policy that uses parameters received from mid-level policy and behavior cloning ensuring precise trajectory execution. We validate our approach on complex real-world acquisition trials involving granular, liquid, semisolid, and deformable food types along with fruit chunks and soup acquisition. Across 46 bowls, LAVA acquires much more efficiently than baselines with a success rate of 89 +/- 4% and generalizes across realistic plate variations such as different positions, varieties, and amount of food in the bowl. Code, datasets, videos, and supplementary materials can be found on our website.

References (31)

Citations (5)

View on Semantic Scholar

Summary

The paper introduces a hierarchical framework combining high-, mid-, and low-level policies to enhance robotic-assisted feeding performance.
The methodology leverages visual networks (ScoopNet, TargetNet, DepthNet) and behavioral cloning to adaptively manipulate complex food types.
Experimental results show an ~89% success rate in bowl clearance, underlining LAVA's robust zero-shot generalization in diverse scenarios.

LAVA: Long-horizon Visual Action-Based Food Acquisition in Robotic Assisted Feeding

The paper introduces a novel framework titled "LAVA" (Long-horizon Visual Action based food Acquisition), aimed at enhancing the capabilities of robotic-assisted feeding (RAF) systems. RAF systems are crucial for individuals with mobility impairments, facilitating their autonomy by using robotic arms to acquire and transfer food. Existing RAF technologies demonstrate proficiency in handling solid foods but encounter challenges when manipulating semi-solid and deformable food items. LAVA addresses these limitations by incorporating a hierarchical policy approach tailored for the long-horizon food acquisition necessary for efficiently clearing bowls of complex food types such as liquids, semi-solids, and fruit chunks.

Framework and Methodology

LAVA integrates three hierarchical policy levels:

High-Level Policy: This policy utilizes ScoopNet to decide between predefined primitives based on the visual characteristics of food. ScoopNet, built on the MobileNetV2 architecture, ensures an accurate selection process by classifying food images into categories that determine the appropriate high-level primitive, specifically "Wide Primitive" for deformable foods and "Deep Primitive" for items that can be directly scooped.
Mid-Level Policy: This policy refines the primitives added by the high-level policy. It leverages TargetNet for segmenting food instances, allowing for strategic acquisition plans through actions like wall-guided scooping and center alignment. For direct actions, DepthNet estimates the depth of food to adjust the spoon trajectory dynamically and ensure effective scooping.
Low-Level Policy: This employs behavioral cloning for trajectory adjustments, derived from kinesthetic teaching. By observing expert demonstrations, the robot learns joint positions and optimal path trajectories required for diverse food-handling tasks, thus minimizing errors such as spillage or breakage.

Experimental Validation and Implications

The LAVA framework was tested in scenarios involving a variety of food textures and configurations, such as cereals, water, and tofu chunks immersed in soup. Experimental results noted a significant improvement over baseline models in terms of bowl clearance efficiency, with success rates of approximately 89%. The system exhibits robust zero-shot generalization capabilities, adapting to unfamiliar food types through its hierarchical structure.

Implications: LAVA sets a strong precedent for developing adaptable and efficient RAF systems. It bridges the gap between hard-coded strategies and the need for nuanced manipulation techniques in complex feeding scenarios. This advancement can enhance the quality of life for individuals relying on robotic assistance, while also reducing caregiver burden in caregiving settings.

Future Directions: Further advancements could aim to incorporate additional primitives and fine-tune the LAVA framework to handle an even broader range of food types, including thin or irregular-shaped items. Moreover, integrating multi-modal sensory data could refine food perception and manipulation tasks, leading to even greater adaptability in diverse environments.

Overall, the LAVA framework exemplifies an evolution in the synthesis of vision-guided robotic manipulation and hierarchical reinforcement strategies, offering a sophisticated solution for the complexities of robotic feeding assistance.

PDF Markdown

Related Papers

Tweets

https://twitter.com/ptokekar/status/1784260404920627624

https://twitter.com/amishabhaskar/status/1773742606356238742

YouTube

Show All Videos