Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 52 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 18 tok/s Pro
GPT-5 High 13 tok/s Pro
GPT-4o 100 tok/s Pro
Kimi K2 192 tok/s Pro
GPT OSS 120B 454 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation (2401.00929v2)

Published 1 Jan 2024 in cs.RO and cs.CV

Abstract: This paper presents GenH2R, a framework for learning generalizable vision-based human-to-robot (H2R) handover skills. The goal is to equip robots with the ability to reliably receive objects with unseen geometry handed over by humans in various complex trajectories. We acquire such generalizability by learning H2R handover at scale with a comprehensive solution including procedural simulation assets creation, automated demonstration generation, and effective imitation learning. We leverage large-scale 3D model repositories, dexterous grasp generation methods, and curve-based 3D animation to create an H2R handover simulation environment named \simabbns, surpassing the number of scenes in existing simulators by three orders of magnitude. We further introduce a distillation-friendly demonstration generation method that automatically generates a million high-quality demonstrations suitable for learning. Finally, we present a 4D imitation learning method augmented by a future forecasting objective to distill demonstrations into a visuo-motor handover policy. Experimental evaluations in both simulators and the real world demonstrate significant improvements (at least +10\% success rate) over baselines in all cases. The project page is https://GenH2R.github.io/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Human-robot interaction: An introduction. Cambridge University Press, 2020.
  2. Robotic grasping and contact: A review. In Proceedings 2000 ICRA. Millennium conference. IEEE international conference on robotics and automation. Symposia proceedings (Cat. No. 00CH37065), pages 348–353. IEEE, 2000.
  3. Survey: Robot programming by demonstration. Technical report, Springrer, 2008.
  4. Data-driven grasp synthesis—a survey. IEEE Transactions on robotics, 30(2):289–309, 2013.
  5. Contactpose: A dataset of grasps with object contact and hand pose. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16, pages 361–378. Springer, 2020.
  6. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  7. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
  8. Dexycb: A benchmark for capturing hand grasping of objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9044–9053, 2021.
  9. Handoversim: A simulation framework and benchmark for human-to-robot object handovers. In 2022 International Conference on Robotics and Automation (ICRA), pages 6941–6947. IEEE, 2022.
  10. Learning human-to-robot handovers from point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9654–9664, 2023.
  11. Nonlinear model predictive control for human-robot handover with application to the aerial case. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 7597–7604. IEEE, 2022.
  12. Imitating task and motion planning with visuomotor transformers. arXiv preprint arXiv:2305.16309, 2023.
  13. Nsm4d: Neural scene model based online 4d point cloud sequence understanding. arXiv preprint arXiv:2310.08326, 2023.
  14. Bridge data: Boosting generalization of robotic skills with cross-domain datasets. arXiv preprint arXiv:2109.13396, 2021.
  15. Acronym: A large-scale grasp dataset based on simulation. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 6222–6227. IEEE, 2021.
  16. Point 4d transformer networks for spatio-temporal modeling in point cloud videos. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14204–14213, 2021.
  17. Arctic: A dataset for dexterous bimanual hand-object manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12943–12954, 2023.
  18. Anygrasp: Robust and efficient grasp perception in spatial and temporal domains. IEEE Transactions on Robotics, 2023.
  19. One-shot visual imitation learning via meta-learning. In Conference on robot learning, pages 357–368. PMLR, 2017.
  20. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
  21. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems, 4:265–293, 2021.
  22. Ego4d: Around the world in 3,000 hours of egocentric video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18995–19012, 2022.
  23. Scaling up and distilling down: Language-guided robot skill acquisition. arXiv preprint arXiv:2307.14535, 2023.
  24. Learning joint reconstruction of hands and manipulated objects. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11807–11816, 2019.
  25. Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning, pages 991–1002. PMLR, 2022.
  26. Deepim: Deep iterative matching for 6d pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 683–698, 2018.
  27. Hoi4d: A 4d egocentric dataset for category-level human-object interaction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21013–21022, 2022.
  28. What matters in learning from offline human demonstrations for robot manipulation. arXiv preprint arXiv:2108.03298, 2021.
  29. Dynamic grasp and trajectory planning for moving objects. Autonomous Robots, 43:1241–1256, 2019.
  30. Guided imitation of task and motion planning. In Conference on Robot Learning, pages 630–640. PMLR, 2022.
  31. OpenAI. Gpt-4 technical report, 2023.
  32. Object handovers: a review for robotics. IEEE Transactions on Robotics, 37(6):1855–1873, 2021.
  33. Dean A Pomerleau. Alvinn: An autonomous land vehicle in a neural network. Advances in neural information processing systems, 1, 1988.
  34. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017.
  35. Object-independent human-to-robot handovers using real time robotic vision. IEEE Robotics and Automation Letters, 6(1):17–23, 2020.
  36. Efficient variants of the icp algorithm. In Proceedings third international conference on 3-D digital imaging and modeling, pages 145–152. IEEE, 2001.
  37. Thomas B Sheridan. Human–robot interaction: status and challenges. Human factors, 58(4):525–532, 2016.
  38. Grasping in the wild: Learning 6dof closed-loop grasping from low-cost demonstrations. IEEE Robotics and Automation Letters, 5(3):4978–4985, 2020.
  39. Raft-3d: Scene flow using rigid-motion embeddings. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8375–8384, 2021.
  40. Manipulation trajectory optimization with online grasp synthesis and selection. In Robotics: Science and Systems (RSS), 2020a.
  41. Manipulation trajectory optimization with online grasp synthesis and selection. In Proceedings of Robotics: Science and Systems (RSS), 2020b.
  42. Goal-auxiliary actor-critic for 6d robotic grasping with point clouds. In Conference on Robot Learning, pages 70–80. PMLR, 2022a.
  43. Dexgraspnet: A large-scale robotic dexterous grasp dataset for general objects based on simulation. arXiv preprint arXiv:2210.02697, 2022b.
  44. Dexgraspnet: A large-scale robotic dexterous grasp dataset for general objects based on simulation. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 11359–11366. IEEE, 2023.
  45. Point primitive transformer for long-term 4d point cloud video understanding. In European Conference on Computer Vision, pages 19–35. Springer, 2022.
  46. Reactive human-to-robot handovers of arbitrary objects. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 3118–3124. IEEE, 2021.
  47. H2o: A benchmark for visual human-human object handover analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15762–15771, 2021.
  48. Flexible handover with real-time robust dynamic grasp trajectory generation. arXiv preprint arXiv:2308.15622, 2023.
  49. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
  50. Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 5628–5635. IEEE, 2018.
Citations (5)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com