Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adaptive Visual Imitation Learning for Robotic Assisted Feeding Across Varied Bowl Configurations and Food Types (2403.12891v1)

Published 19 Mar 2024 in cs.RO, cs.AI, and cs.CV

Abstract: In this study, we introduce a novel visual imitation network with a spatial attention module for robotic assisted feeding (RAF). The goal is to acquire (i.e., scoop) food items from a bowl. However, achieving robust and adaptive food manipulation is particularly challenging. To deal with this, we propose a framework that integrates visual perception with imitation learning to enable the robot to handle diverse scenarios during scooping. Our approach, named AVIL (adaptive visual imitation learning), exhibits adaptability and robustness across different bowl configurations in terms of material, size, and position, as well as diverse food types including granular, semi-solid, and liquid, even in the presence of distractors. We validate the effectiveness of our approach by conducting experiments on a real robot. We also compare its performance with a baseline. The results demonstrate improvement over the baseline across all scenarios, with an enhancement of up to 2.5 times in terms of a success metric. Notably, our model, trained solely on data from a transparent glass bowl containing granular cereals, showcases generalization ability when tested zero-shot on other bowl configurations with different types of food.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. T. Bhattacharjee, G. Lee, H. Song, and S. S. Srinivasa, “Towards robotic feeding: Role of haptics in fork-based food manipulation,” IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 1485–1492, 2019.
  2. F. F. Goldau, T. K. Shastha, M. Kyrarini, and A. Gräser, “Autonomous multi-sensory robotic assistant for a drinking task,” in 2019 IEEE 16th International Conference on Rehabilitation Robotics (ICORR), pp. 210–216, IEEE, 2019.
  3. P. Sundaresan, S. Belkhale, and D. Sadigh, “Learning visuo-haptic skewering strategies for robot-assisted feeding,” in 6th Annual Conference on Robot Learning, 2022.
  4. E. K. Gordon, S. Roychowdhury, T. Bhattacharjee, K. Jamieson, and S. S. Srinivasa, “Leveraging post hoc context for faster learning in bandit settings with applications in robot-assisted feeding,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 10528–10535, IEEE, 2021.
  5. D. Gallenberger, T. Bhattacharjee, Y. Kim, and S. S. Srinivasa, “Transfer depends on acquisition: Analyzing manipulation strategies for robotic feeding,” in 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 267–276, IEEE, 2019.
  6. R. Feng, Y. Kim, G. Lee, E. K. Gordon, M. Schmittle, S. Kumar, T. Bhattacharjee, and S. S. Srinivasa, “Robot-assisted feeding: Generalizing skewering strategies across food items on a plate,” in The International Symposium of Robotics Research, pp. 427–442, Springer, 2019.
  7. E. K. Gordon, X. Meng, T. Bhattacharjee, M. Barnes, and S. S. Srinivasa, “Adaptive robot-assisted feeding: An online learning framework for acquiring previously unseen food items,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 9659–9666, IEEE, 2020.
  8. J. Grannen, Y. Wu, S. Belkhale, and D. Sadigh, “Learning bimanual scooping policies for food acquisition,” arXiv preprint arXiv:2211.14652, 2022.
  9. Y.-L. Tai, Y. C. Chiu, Y.-W. Chao, and Y.-T. Chen, “Scone: A food scooping robot learning framework with active perception,” in Conference on Robot Learning, pp. 849–865, PMLR, 2023.
  10. D. Park, Y. K. Kim, Z. M. Erickson, and C. C. Kemp, “Towards assistive feeding with a general-purpose mobile manipulator,” arXiv preprint arXiv:1605.07996, 2016.
  11. A. Hussein, M. M. Gaber, E. Elyan, and C. Jayne, “Imitation learning: A survey of learning methods,” ACM Computing Surveys (CSUR), vol. 50, no. 2, pp. 1–35, 2017.
  12. B. Fang, S. Jia, D. Guo, M. Xu, S. Wen, and F. Sun, “Survey of imitation learning for robotic manipulation,” International Journal of Intelligent Robotics and Applications, vol. 3, pp. 362–369, 2019.
  13. E. Johns, “Coarse-to-fine imitation learning: Robot manipulation from a single demonstration,” in 2021 IEEE international conference on robotics and automation (ICRA), pp. 4613–4619, IEEE, 2021.
  14. S. Young, D. Gandhi, S. Tulsiani, A. Gupta, P. Abbeel, and L. Pinto, “Visual imitation made easy,” in Conference on Robot Learning, pp. 1992–2005, PMLR, 2021.
  15. D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly, M. Kalakrishnan, V. Vanhoucke, et al., “Scalable deep reinforcement learning for vision-based robotic manipulation,” in Conference on Robot Learning, pp. 651–673, PMLR, 2018.
  16. M. H. Ali, K. Aizat, K. Yerkhan, T. Zhandos, and O. Anuar, “Vision-based robot manipulator for industrial applications,” Procedia computer science, vol. 133, pp. 205–212, 2018.
  17. R. Julian, B. Swanson, G. S. Sukhatme, S. Levine, C. Finn, and K. Hausman, “Efficient adaptation for end-to-end vision-based robotic manipulation,” in 4th Lifelong Machine Learning Workshop at ICML 2020, 2020.
  18. Y. Cong, R. Chen, B. Ma, H. Liu, D. Hou, and C. Yang, “A comprehensive study of 3-d vision-based robot manipulation,” IEEE Transactions on Cybernetics, vol. 53, no. 3, pp. 1682–1698, 2021.
  19. T. Zhang, Z. McCarthy, O. Jow, D. Lee, X. Chen, K. Goldberg, and P. Abbeel, “Deep imitation learning for complex manipulation tasks from virtual reality teleoperation,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 5628–5635, IEEE, 2018.
  20. X. Zhu, D. Cheng, Z. Zhang, S. Lin, and J. Dai, “An empirical study of spatial attention mechanisms in deep networks,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 6688–6697, 2019.
  21. S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), pp. 3–19, 2018.
  22. H. Chen, J. Gu, and Z. Zhang, “Attention in attention network for image super-resolution,” arXiv preprint arXiv:2104.09497, 2021.
  23. M. Shridhar, L. Manuelli, and D. Fox, “Cliport: What and where pathways for robotic manipulation,” in Conference on Robot Learning, pp. 894–906, PMLR, 2022.
  24. A. Zeng, P. Florence, J. Tompson, S. Welker, J. Chien, M. Attarian, T. Armstrong, I. Krasin, D. Duong, V. Sindhwani, et al., “Transporter networks: Rearranging the visual world for robotic manipulation,” in Conference on Robot Learning, pp. 726–747, PMLR, 2021.
  25. D. Seita, P. Florence, J. Tompson, E. Coumans, V. Sindhwani, K. Goldberg, and A. Zeng, “Learning to rearrange deformable cables, fabrics, and bags with goal-conditioned transporter networks,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 4568–4575, IEEE, 2021.
  26. V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th international conference on machine learning (ICML-10), pp. 807–814, 2010.
  27. D. Tzutalin, “Labelimg,” GitHub repository, vol. 6, 2015.
  28. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  29. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
  30. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, pp. 2980–2988, 2017.
  31. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016.
  32. K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE international conference on computer vision, pp. 2961–2969, 2017.
Citations (3)

Summary

We haven't generated a summary for this paper yet.