Papers
Topics
Authors
Recent
2000 character limit reached

Bridging the Human to Robot Dexterity Gap through Object-Oriented Rewards (2410.23289v1)

Published 30 Oct 2024 in cs.RO and cs.LG

Abstract: Training robots directly from human videos is an emerging area in robotics and computer vision. While there has been notable progress with two-fingered grippers, learning autonomous tasks for multi-fingered robot hands in this way remains challenging. A key reason for this difficulty is that a policy trained on human hands may not directly transfer to a robot hand due to morphology differences. In this work, we present HuDOR, a technique that enables online fine-tuning of policies by directly computing rewards from human videos. Importantly, this reward function is built using object-oriented trajectories derived from off-the-shelf point trackers, providing meaningful learning signals despite the morphology gap and visual differences between human and robot hands. Given a single video of a human solving a task, such as gently opening a music box, HuDOR enables our four-fingered Allegro hand to learn the task with just an hour of online interaction. Our experiments across four tasks show that HuDOR achieves a 4x improvement over baselines. Code and videos are available on our website, https://object-rewards.github.io.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. V. Kumar, E. Todorov, and S. Levine, “Optimal control with learned local models: Application to dexterous manipulation,” 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 378–383, 2016.
  2. T. Z. Zhao, V. Kumar, S. Levine, and C. Finn, “Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware,” arXiv e-prints arXiv:2304.13705, Apr. 2023.
  3. S. Lee, Y. Wang, H. Etukuru, H. J. Kim, N. M. Mahi Shafiullah, and L. Pinto, “Behavior Generation with Latent Actions,” arXiv e-prints arXiv:2403.03181, Mar. 2024.
  4. C. Chi, Z. Xu, S. Feng, E. Cousineau, Y. Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” The International Journal of Robotics Research, 2024.
  5. Z. J. Cui, Y. Wang, N. M. M. Shafiullah, and L. Pinto, “From play to policy: Conditional behavior generation from uncurated robot data,” arXiv preprint arXiv:2210.10047, 2022.
  6. A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess, A. Dubey, C. Finn, et al., “RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control,” arXiv e-prints arXiv:2307.15818, p. arXiv:2307.15818, July 2023.
  7. Open X-Embodiment Collaboration, A. O’Neill, A. Rehman, A. Gupta, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, et al., “Open X-Embodiment: Robotic Learning Datasets and RT-X Models,” arXiv e-prints arXiv:2310.08864, Oct. 2023.
  8. H. Etukuru, N. Naka, Z. Hu, S. Lee, J. Mehu, A. Edsinger, C. Paxton, S. Chintala, L. Pinto, and N. M. Mahi Shafiullah, “Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments,” arXiv e-prints arXiv:2409.05865, Sept. 2024.
  9. S. Haldar, Z. Peng, and L. Pinto, “Baku: An efficient transformer for multi-task policy learning,” arXiv preprint arXiv:2406.07539, 2024.
  10. S. Pandian Arunachalam, I. Güzey, S. Chintala, and L. Pinto, “Holo-Dex: Teaching Dexterity with Immersive Mixed Reality,” arXiv e-prints arXiv:2210.06463, Oct. 2022.
  11. A. Iyer, Z. Peng, Y. Dai, I. Guzey, S. Haldar, S. Chintala, and L. Pinto, “OPEN TEACH: A Versatile Teleoperation System for Robotic Manipulation,” arXiv e-prints arXiv:2403.07870, Mar. 2024.
  12. R. Ding, Y. Qin, J. Zhu, C. Jia, S. Yang, R. Yang, X. Qi, and X. Wang, “Bunny-VisionPro: Real-Time Bimanual Dexterous Teleoperation for Imitation Learning,” arXiv e-prints arXiv:2407.03162, July 2024.
  13. S. Kumar, J. Zamora, N. Hansen, R. Jangir, and X. Wang, “Graph inverse reinforcement learning from diverse videos,” arXiv preprint arXiv:2207.14299, 2022.
  14. L. Smith, N. Dhawan, M. Zhang, P. Abbeel, and S. Levine, “AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human Videos,” arXiv e-prints arXiv:1912.04443, Dec. 2019.
  15. C. Eze and C. Crick, “Learning by Watching: A Review of Video-based Learning Approaches for Robot Manipulation,” arXiv e-prints arXiv:2402.07127, Feb. 2024.
  16. C. Wang, L. Fan, J. Sun, R. Zhang, L. Fei-Fei, D. Xu, Y. Zhu, and A. Anandkumar, “MimicPlay: Long-Horizon Imitation Learning by Watching Human Play,” arXiv e-prints arXiv:2302.12422, Feb. 2023.
  17. C. Wang, H. Shi, W. Wang, R. Zhang, L. Fei-Fei, and C. K. Liu, “DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation,” arXiv e-prints arXiv:2403.07788, Mar. 2024.
  18. A. Handa, A. Allshire, V. Makoviychuk, A. Petrenko, R. Singh, J. Liu, D. Makoviichuk, K. Van Wyk, A. Zhurkevich, B. Sundaralingam, Y. Narang, J.-F. Lafleche, D. Fox, and G. State, “DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality,” arXiv e-prints arXiv:2210.13702, Oct. 2022.
  19. OpenAI, I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, J. Schneider, N. Tezak, J. Tworek, P. Welinder, L. Weng, Q. Yuan, W. Zaremba, and L. Zhang, “Solving Rubik’s Cube with a Robot Hand,” arXiv e-prints arXiv:1910.07113, Oct. 2019.
  20. K. Shaw, A. Agarwal, and D. Pathak, “LEAP Hand: Low-Cost, Efficient, and Anthropomorphic Hand for Robot Learning,” arXiv e-prints arXiv:2309.06440, Sept. 2023.
  21. Z.-H. Yin, B. Huang, Y. Qin, Q. Chen, and X. Wang, “Rotating without Seeing: Towards In-hand Dexterity through Touch,” arXiv e-prints arXiv:2303.10880, Mar. 2023.
  22. Y. J. Ma, W. Liang, G. Wang, D.-A. Huang, O. Bastani, D. Jayaraman, Y. Zhu, L. Fan, and A. Anandkumar, “Eureka: Human-Level Reward Design via Coding Large Language Models,” arXiv e-prints arXiv:2310.12931, Oct. 2023.
  23. S. P. Arunachalam, I. Güzey, S. Chintala, and L. Pinto, “Holo-dex: Teaching dexterity with immersive mixed reality,” arXiv preprint arXiv:2210.06463, 2022.
  24. S. Yang, M. Liu, Y. Qin, R. Ding, J. Li, X. Cheng, R. Yang, S. Yi, and X. Wang, “ACE: A Cross-Platform Visual-Exoskeletons System for Low-Cost Dexterous Teleoperation,” arXiv e-prints arXiv:2408.11805, Aug. 2024.
  25. A. Handa, K. Van Wyk, W. Yang, J. Liang, Y.-W. Chao, Q. Wan, S. Birchfield, N. Ratliff, and D. Fox, “Dexpilot: Vision-based teleoperation of dexterous robotic hand-arm system,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020, pp. 9164–9170.
  26. I. Guzey, B. Evans, S. Chintala, and L. Pinto, “Dexterity from touch: Self-supervised pre-training of tactile representations with robotic play,” 2023.
  27. S. Haldar, V. Mathur, D. Yarats, and L. Pinto, “Watch and match: Supercharging imitation with regularized optimal transport,” arXiv preprint arXiv:2206.15469, 2022.
  28. J. Pari, N. M. Shafiullah, S. P. Arunachalam, and L. Pinto, “The surprising effectiveness of representation learning for visual imitation,” 2021.
  29. I. Guzey, Y. Dai, B. Evans, S. Chintala, and L. Pinto, “See to Touch: Learning Tactile Dexterity through Visual Incentives,” arXiv e-prints arXiv:2309.12300, Sept. 2023.
  30. S. Haldar, J. Pari, A. Rai, and L. Pinto, “Teach a Robot to FISH: Versatile Imitation from One Minute of Demonstrations,” arXiv e-prints arXiv:2303.01497, Mar. 2023.
  31. J. Liang, R. Liu, E. Ozguroglu, S. Sudhakar, A. Dave, P. Tokmakov, S. Song, and C. Vondrick, “Dreamitate: Real-World Visuomotor Policy Learning via Video Generation,” arXiv e-prints arXiv:2406.16862, June 2024.
  32. M. Yang, Y. Du, K. Ghasemipour, J. Tompson, D. Schuurmans, and P. Abbeel, “Learning interactive real-world simulators,” arXiv preprint arXiv:2310.06114, 2023.
  33. K. Pertsch, R. Desai, V. Kumar, F. Meier, J. J. Lim, D. Batra, and A. Rai, “Cross-Domain Transfer via Semantic Skill Imitation,” arXiv e-prints arXiv:2212.07407, Dec. 2022.
  34. K. Grauman, A. Westbury, L. Torresani, K. Kitani, J. Malik, T. Afouras, K. Ashutosh, V. Baiyya, et al., “Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives,” arXiv e-prints arXiv:2311.18259, Nov. 2023.
  35. J. Urain, A. Mandlekar, Y. Du, M. Shafiullah, D. Xu, K. Fragkiadaki, G. Chalvatzaki, and J. Peters, “Deep Generative Models in Robotics: A Survey on Learning from Multimodal Demonstrations,” arXiv e-prints, p. arXiv:2408.04380, Aug. 2024.
  36. S. Bahl, R. Mendonca, L. Chen, U. Jain, and D. Pathak, “Affordances from Human Videos as a Versatile Representation for Robotics,” arXiv e-prints arXiv:2304.08488, Apr. 2023.
  37. S. Chen, C. Wang, K. Nguyen, L. Fei-Fei, and C. K. Liu, “Arcap: Collecting high-quality human demonstrations for robot learning with augmented reality feedback,” arXiv preprint arXiv:2410.08464, 2024.
  38. J. Li, Y. Zhu, Y. Xie, Z. Jiang, M. Seo, G. Pavlakos, and Y. Zhu, “Okami: Teaching humanoid robots manipulation skills through single video imitation,” arXiv preprint arXiv:2410.11792, 2024.
  39. L. Keselman, J. Iselin Woodfill, A. Grunnet-Jepsen, and A. Bhowmik, “Intel RealSense Stereoscopic Depth Cameras,” arXiv e-prints arXiv:1705.05548, May 2017.
  40. T. Yenamandra, F. Bernard, J. Wang, F. Mueller, and C. Theobalt, “Convex Optimisation for Inverse Kinematics,” arXiv e-prints arXiv:1910.11016, Oct. 2019.
  41. S. Haldar, V. Mathur, D. Yarats, and L. Pinto, “Watch and match: Supercharging imitation with regularized optimal transport,” in Conference on Robot Learning.   PMLR, 2023, pp. 32–43.
  42. L. Medeiros et al., “Lang-segment-anything,” https://github.com/luca-medeiros/lang-segment-anything, 2023, accessed: 2024-09-15.
  43. A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment Anything,” arXiv e-prints arXiv:2304.02643, Apr. 2023.
  44. M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging Properties in Self-Supervised Vision Transformers,” arXiv e-prints arXiv:2104.14294, Apr. 2021.
  45. N. Karaev, I. Rocco, B. Graham, N. Neverova, A. Vedaldi, and C. Rupprecht, “Cotracker: It is better to track together,” arXiv preprint arXiv:2307.07635, 2023.
  46. G. E. Uhlenbeck and L. S. Ornstein, “On the theory of the brownian motion,” Phys. Rev., vol. 36, pp. 823–841, Sep 1930. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRev.36.823
  47. T. Lillicrap, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
  48. D. Yarats, R. Fergus, A. Lazaric, and L. Pinto, “Mastering visual continuous control: Improved data-augmented reinforcement learning,” arXiv preprint arXiv:2107.09645, 2021.
  49. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660.
  50. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.