Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LAVA: Long-horizon Visual Action based Food Acquisition (2403.12876v1)

Published 19 Mar 2024 in cs.RO and cs.HC

Abstract: Robotic Assisted Feeding (RAF) addresses the fundamental need for individuals with mobility impairments to regain autonomy in feeding themselves. The goal of RAF is to use a robot arm to acquire and transfer food to individuals from the table. Existing RAF methods primarily focus on solid foods, leaving a gap in manipulation strategies for semi-solid and deformable foods. This study introduces Long-horizon Visual Action (LAVA) based food acquisition of liquid, semisolid, and deformable foods. Long-horizon refers to the goal of "clearing the bowl" by sequentially acquiring the food from the bowl. LAVA employs a hierarchical policy for long-horizon food acquisition tasks. The framework uses high-level policy to determine primitives by leveraging ScoopNet. At the mid-level, LAVA finds parameters for primitives using vision. To carry out sequential plans in the real world, LAVA delegates action execution which is driven by Low-level policy that uses parameters received from mid-level policy and behavior cloning ensuring precise trajectory execution. We validate our approach on complex real-world acquisition trials involving granular, liquid, semisolid, and deformable food types along with fruit chunks and soup acquisition. Across 46 bowls, LAVA acquires much more efficiently than baselines with a success rate of 89 +/- 4% and generalizes across realistic plate variations such as different positions, varieties, and amount of food in the bowl. Code, datasets, videos, and supplementary materials can be found on our website.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. S. W. Brose, D. J. Weber, B. A. Salatin, G. G. Grindle, H. Wang, J. J. Vazquez, and R. A. Cooper, “The role of assistive robotics in the lives of persons with disability,” American Journal of Physical Medicine & Rehabilitation, vol. 89, no. 6, pp. 509–521, 2010.
  2. J. Grannen, Y. Wu, S. Belkhale, and D. Sadigh, “Learning bimanual scooping policies for food acquisition,” arXiv preprint arXiv:2211.14652, 2022.
  3. P. Sundaresan, J. Wu, and D. Sadigh, “Learning sequential acquisition policies for robot-assisted feeding,” arXiv preprint arXiv:2309.05197, 2023.
  4. D. Gallenberger, T. Bhattacharjee, Y. Kim, and S. S. Srinivasa, “Transfer depends on acquisition: Analyzing manipulation strategies for robotic feeding,” in 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI).   IEEE, 2019, pp. 267–276.
  5. R. Feng, Y. Kim, G. Lee, E. K. Gordon, M. Schmittle, S. Kumar, T. Bhattacharjee, and S. S. Srinivasa, “Robot-assisted feeding: Generalizing skewering strategies across food items on a plate,” in The International Symposium of Robotics Research.   Springer, 2019, pp. 427–442.
  6. T. Bhattacharjee, G. Lee, H. Song, and S. S. Srinivasa, “Towards robotic feeding: Role of haptics in fork-based food manipulation,” IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 1485–1492, 2019.
  7. P. Sundaresan, S. Belkhale, and D. Sadigh, “Learning visuo-haptic skewering strategies for robot-assisted feeding,” in 6th Annual Conference on Robot Learning, 2022.
  8. S. Belkhale, E. K. Gordon, Y. Chen, S. Srinivasa, T. Bhattacharjee, and D. Sadigh, “Balancing efficiency and comfort in robot-assisted bite transfer,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 4757–4763.
  9. R. K. Jenamani, D. Stabile, Z. Liu, A. Anwar, K. Dimitropoulou, and T. Bhattacharjee, “Feel the bite: Robot-assisted inside-mouth bite transfer using robust mouth perception and physical interaction-aware control,” in Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, 2024, pp. 313–322.
  10. E. K. Gordon, R. K. Jenamani, A. Nanavati, Z. Liu, H. Bolotski, R. Karim, D. Stabile, A. Kashyap, B. H. Zhu, X. Dai et al., “An adaptable, safe, and portable robot-assisted feeding system,” arXiv preprint arXiv:2403.04134, 2024.
  11. https://meetobi.com/. (2023). [Online]. Available: https://meetobi.com/
  12. X. Lin, C. Qi, Y. Zhang, Z. Huang, K. Fragkiadaki, Y. Li, C. Gan, and D. Held, “Planning with spatial-temporal abstraction from point clouds for deformable object manipulation,” arXiv preprint arXiv:2210.15751, 2022.
  13. M. Dalal, D. Pathak, and R. R. Salakhutdinov, “Accelerating robotic reinforcement learning via parameterized action primitives,” Advances in Neural Information Processing Systems, vol. 34, pp. 21 847–21 859, 2021.
  14. S. Nasiriany, H. Liu, and Y. Zhu, “Augmenting reinforcement learning with behavior primitives for diverse manipulation tasks,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 7477–7484.
  15. Z. Zhu and H. Hu, “Robot learning from demonstration in robotic assembly: A survey,” Robotics, vol. 7, no. 2, p. 17, 2018.
  16. Z. Xie, Q. Zhang, Z. Jiang, and H. Liu, “Robot learning from demonstration for path planning: A review,” Science China Technological Sciences, vol. 63, no. 8, pp. 1325–1334, 2020.
  17. C. Lauretti, F. Cordella, E. Guglielmelli, and L. Zollo, “Learning by demonstration for planning activities of daily living in rehabilitation and assistive robotics,” IEEE Robotics and Automation Letters, vol. 2, no. 3, pp. 1375–1382, 2017.
  18. R. Rahmatizadeh, P. Abolghasemi, L. Bölöni, and S. Levine, “Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration,” in 2018 IEEE international conference on robotics and automation (ICRA).   IEEE, 2018, pp. 3758–3765.
  19. B. Akgun, M. Cakmak, J. W. Yoo, and A. L. Thomaz, “Trajectories and keyframes for kinesthetic teaching: A human-robot interaction perspective,” in Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction, 2012, pp. 391–398.
  20. W. Si, N. Wang, and C. Yang, “A review on manipulation skill acquisition through teleoperation-based learning from demonstration,” Cognitive Computation and Systems, vol. 3, no. 1, pp. 1–16, 2021.
  21. D. Vogt, S. Stepputtis, S. Grehl, B. Jung, and H. B. Amor, “A system for learning continuous human-robot interactions from human-human demonstrations,” in 2017 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2017, pp. 2882–2889.
  22. S. Srivastava, E. Fang, L. Riano, R. Chitnis, S. Russell, and P. Abbeel, “Combined task and motion planning through an extensible planner-independent interface layer,” in 2014 IEEE international conference on robotics and automation (ICRA).   IEEE, 2014, pp. 639–646.
  23. C. R. Garrett, R. Chitnis, R. Holladay, B. Kim, T. Silver, L. P. Kaelbling, and T. Lozano-Pérez, “Integrated task and motion planning,” Annual review of control, robotics, and autonomous systems, vol. 4, pp. 265–293, 2021.
  24. R. Chitnis, D. Hadfield-Menell, A. Gupta, S. Srivastava, E. Groshev, C. Lin, and P. Abbeel, “Guided search for task and motion plans using learned heuristics,” in 2016 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2016, pp. 447–454.
  25. H. Shi, H. Xu, Z. Huang, Y. Li, and J. Wu, “Robocraft: Learning to see, simulate, and shape elasto-plastic objects in 3d with graph networks,” The International Journal of Robotics Research, p. 02783649231219020, 2023.
  26. S. Pateria, B. Subagdja, A.-h. Tan, and C. Quek, “Hierarchical reinforcement learning: A comprehensive survey,” ACM Computing Surveys (CSUR), vol. 54, no. 5, pp. 1–35, 2021.
  27. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  28. T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Doll’a r, and C. L. Zitnick, “Microsoft COCO: common objects in context,” CoRR, vol. abs/1405.0312, 2014. [Online]. Available: http://arxiv.org/abs/1405.0312
  29. A. Beck and S. Sabach, “Weiszfeld’s method: Old and new results,” Journal of Optimization Theory and Applications, vol. 164, pp. 1–40, 2015.
  30. E. Weiszfeld and F. Plastria, “On the point for which the sum of the distances to n given points is minimum,” Annals of Operations Research, vol. 167, pp. 7–41, 2009.
  31. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988.
Citations (5)

Summary

  • The paper introduces a hierarchical framework combining high-, mid-, and low-level policies to enhance robotic-assisted feeding performance.
  • The methodology leverages visual networks (ScoopNet, TargetNet, DepthNet) and behavioral cloning to adaptively manipulate complex food types.
  • Experimental results show an ~89% success rate in bowl clearance, underlining LAVA's robust zero-shot generalization in diverse scenarios.

LAVA: Long-horizon Visual Action-Based Food Acquisition in Robotic Assisted Feeding

The paper introduces a novel framework titled "LAVA" (Long-horizon Visual Action based food Acquisition), aimed at enhancing the capabilities of robotic-assisted feeding (RAF) systems. RAF systems are crucial for individuals with mobility impairments, facilitating their autonomy by using robotic arms to acquire and transfer food. Existing RAF technologies demonstrate proficiency in handling solid foods but encounter challenges when manipulating semi-solid and deformable food items. LAVA addresses these limitations by incorporating a hierarchical policy approach tailored for the long-horizon food acquisition necessary for efficiently clearing bowls of complex food types such as liquids, semi-solids, and fruit chunks.

Framework and Methodology

LAVA integrates three hierarchical policy levels:

  1. High-Level Policy: This policy utilizes ScoopNet to decide between predefined primitives based on the visual characteristics of food. ScoopNet, built on the MobileNetV2 architecture, ensures an accurate selection process by classifying food images into categories that determine the appropriate high-level primitive, specifically "Wide Primitive" for deformable foods and "Deep Primitive" for items that can be directly scooped.
  2. Mid-Level Policy: This policy refines the primitives added by the high-level policy. It leverages TargetNet for segmenting food instances, allowing for strategic acquisition plans through actions like wall-guided scooping and center alignment. For direct actions, DepthNet estimates the depth of food to adjust the spoon trajectory dynamically and ensure effective scooping.
  3. Low-Level Policy: This employs behavioral cloning for trajectory adjustments, derived from kinesthetic teaching. By observing expert demonstrations, the robot learns joint positions and optimal path trajectories required for diverse food-handling tasks, thus minimizing errors such as spillage or breakage.

Experimental Validation and Implications

The LAVA framework was tested in scenarios involving a variety of food textures and configurations, such as cereals, water, and tofu chunks immersed in soup. Experimental results noted a significant improvement over baseline models in terms of bowl clearance efficiency, with success rates of approximately 89%. The system exhibits robust zero-shot generalization capabilities, adapting to unfamiliar food types through its hierarchical structure.

Implications: LAVA sets a strong precedent for developing adaptable and efficient RAF systems. It bridges the gap between hard-coded strategies and the need for nuanced manipulation techniques in complex feeding scenarios. This advancement can enhance the quality of life for individuals relying on robotic assistance, while also reducing caregiver burden in caregiving settings.

Future Directions: Further advancements could aim to incorporate additional primitives and fine-tune the LAVA framework to handle an even broader range of food types, including thin or irregular-shaped items. Moreover, integrating multi-modal sensory data could refine food perception and manipulation tasks, leading to even greater adaptability in diverse environments.

Overall, the LAVA framework exemplifies an evolution in the synthesis of vision-guided robotic manipulation and hierarchical reinforcement strategies, offering a sophisticated solution for the complexities of robotic feeding assistance.

Youtube Logo Streamline Icon: https://streamlinehq.com