LAVA: Long-horizon Visual Action based Food Acquisition
Abstract: Robotic Assisted Feeding (RAF) addresses the fundamental need for individuals with mobility impairments to regain autonomy in feeding themselves. The goal of RAF is to use a robot arm to acquire and transfer food to individuals from the table. Existing RAF methods primarily focus on solid foods, leaving a gap in manipulation strategies for semi-solid and deformable foods. This study introduces Long-horizon Visual Action (LAVA) based food acquisition of liquid, semisolid, and deformable foods. Long-horizon refers to the goal of "clearing the bowl" by sequentially acquiring the food from the bowl. LAVA employs a hierarchical policy for long-horizon food acquisition tasks. The framework uses high-level policy to determine primitives by leveraging ScoopNet. At the mid-level, LAVA finds parameters for primitives using vision. To carry out sequential plans in the real world, LAVA delegates action execution which is driven by Low-level policy that uses parameters received from mid-level policy and behavior cloning ensuring precise trajectory execution. We validate our approach on complex real-world acquisition trials involving granular, liquid, semisolid, and deformable food types along with fruit chunks and soup acquisition. Across 46 bowls, LAVA acquires much more efficiently than baselines with a success rate of 89 +/- 4% and generalizes across realistic plate variations such as different positions, varieties, and amount of food in the bowl. Code, datasets, videos, and supplementary materials can be found on our website.
- S. W. Brose, D. J. Weber, B. A. Salatin, G. G. Grindle, H. Wang, J. J. Vazquez, and R. A. Cooper, “The role of assistive robotics in the lives of persons with disability,” American Journal of Physical Medicine & Rehabilitation, vol. 89, no. 6, pp. 509–521, 2010.
- J. Grannen, Y. Wu, S. Belkhale, and D. Sadigh, “Learning bimanual scooping policies for food acquisition,” arXiv preprint arXiv:2211.14652, 2022.
- P. Sundaresan, J. Wu, and D. Sadigh, “Learning sequential acquisition policies for robot-assisted feeding,” arXiv preprint arXiv:2309.05197, 2023.
- D. Gallenberger, T. Bhattacharjee, Y. Kim, and S. S. Srinivasa, “Transfer depends on acquisition: Analyzing manipulation strategies for robotic feeding,” in 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 2019, pp. 267–276.
- R. Feng, Y. Kim, G. Lee, E. K. Gordon, M. Schmittle, S. Kumar, T. Bhattacharjee, and S. S. Srinivasa, “Robot-assisted feeding: Generalizing skewering strategies across food items on a plate,” in The International Symposium of Robotics Research. Springer, 2019, pp. 427–442.
- T. Bhattacharjee, G. Lee, H. Song, and S. S. Srinivasa, “Towards robotic feeding: Role of haptics in fork-based food manipulation,” IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 1485–1492, 2019.
- P. Sundaresan, S. Belkhale, and D. Sadigh, “Learning visuo-haptic skewering strategies for robot-assisted feeding,” in 6th Annual Conference on Robot Learning, 2022.
- S. Belkhale, E. K. Gordon, Y. Chen, S. Srinivasa, T. Bhattacharjee, and D. Sadigh, “Balancing efficiency and comfort in robot-assisted bite transfer,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 4757–4763.
- R. K. Jenamani, D. Stabile, Z. Liu, A. Anwar, K. Dimitropoulou, and T. Bhattacharjee, “Feel the bite: Robot-assisted inside-mouth bite transfer using robust mouth perception and physical interaction-aware control,” in Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, 2024, pp. 313–322.
- E. K. Gordon, R. K. Jenamani, A. Nanavati, Z. Liu, H. Bolotski, R. Karim, D. Stabile, A. Kashyap, B. H. Zhu, X. Dai et al., “An adaptable, safe, and portable robot-assisted feeding system,” arXiv preprint arXiv:2403.04134, 2024.
- https://meetobi.com/. (2023). [Online]. Available: https://meetobi.com/
- X. Lin, C. Qi, Y. Zhang, Z. Huang, K. Fragkiadaki, Y. Li, C. Gan, and D. Held, “Planning with spatial-temporal abstraction from point clouds for deformable object manipulation,” arXiv preprint arXiv:2210.15751, 2022.
- M. Dalal, D. Pathak, and R. R. Salakhutdinov, “Accelerating robotic reinforcement learning via parameterized action primitives,” Advances in Neural Information Processing Systems, vol. 34, pp. 21 847–21 859, 2021.
- S. Nasiriany, H. Liu, and Y. Zhu, “Augmenting reinforcement learning with behavior primitives for diverse manipulation tasks,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 7477–7484.
- Z. Zhu and H. Hu, “Robot learning from demonstration in robotic assembly: A survey,” Robotics, vol. 7, no. 2, p. 17, 2018.
- Z. Xie, Q. Zhang, Z. Jiang, and H. Liu, “Robot learning from demonstration for path planning: A review,” Science China Technological Sciences, vol. 63, no. 8, pp. 1325–1334, 2020.
- C. Lauretti, F. Cordella, E. Guglielmelli, and L. Zollo, “Learning by demonstration for planning activities of daily living in rehabilitation and assistive robotics,” IEEE Robotics and Automation Letters, vol. 2, no. 3, pp. 1375–1382, 2017.
- R. Rahmatizadeh, P. Abolghasemi, L. Bölöni, and S. Levine, “Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration,” in 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 2018, pp. 3758–3765.
- B. Akgun, M. Cakmak, J. W. Yoo, and A. L. Thomaz, “Trajectories and keyframes for kinesthetic teaching: A human-robot interaction perspective,” in Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction, 2012, pp. 391–398.
- W. Si, N. Wang, and C. Yang, “A review on manipulation skill acquisition through teleoperation-based learning from demonstration,” Cognitive Computation and Systems, vol. 3, no. 1, pp. 1–16, 2021.
- D. Vogt, S. Stepputtis, S. Grehl, B. Jung, and H. B. Amor, “A system for learning continuous human-robot interactions from human-human demonstrations,” in 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2017, pp. 2882–2889.
- S. Srivastava, E. Fang, L. Riano, R. Chitnis, S. Russell, and P. Abbeel, “Combined task and motion planning through an extensible planner-independent interface layer,” in 2014 IEEE international conference on robotics and automation (ICRA). IEEE, 2014, pp. 639–646.
- C. R. Garrett, R. Chitnis, R. Holladay, B. Kim, T. Silver, L. P. Kaelbling, and T. Lozano-Pérez, “Integrated task and motion planning,” Annual review of control, robotics, and autonomous systems, vol. 4, pp. 265–293, 2021.
- R. Chitnis, D. Hadfield-Menell, A. Gupta, S. Srivastava, E. Groshev, C. Lin, and P. Abbeel, “Guided search for task and motion plans using learned heuristics,” in 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2016, pp. 447–454.
- H. Shi, H. Xu, Z. Huang, Y. Li, and J. Wu, “Robocraft: Learning to see, simulate, and shape elasto-plastic objects in 3d with graph networks,” The International Journal of Robotics Research, p. 02783649231219020, 2023.
- S. Pateria, B. Subagdja, A.-h. Tan, and C. Quek, “Hierarchical reinforcement learning: A comprehensive survey,” ACM Computing Surveys (CSUR), vol. 54, no. 5, pp. 1–35, 2021.
- M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
- T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Doll’a r, and C. L. Zitnick, “Microsoft COCO: common objects in context,” CoRR, vol. abs/1405.0312, 2014. [Online]. Available: http://arxiv.org/abs/1405.0312
- A. Beck and S. Sabach, “Weiszfeld’s method: Old and new results,” Journal of Optimization Theory and Applications, vol. 164, pp. 1–40, 2015.
- E. Weiszfeld and F. Plastria, “On the point for which the sum of the distances to n given points is minimum,” Annals of Operations Research, vol. 167, pp. 7–41, 2009.
- T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.