Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MoDem-V2: Visuo-Motor World Models for Real-World Robot Manipulation (2309.14236v2)

Published 25 Sep 2023 in cs.RO, cs.AI, cs.CV, and cs.LG

Abstract: Robotic systems that aspire to operate in uninstrumented real-world environments must perceive the world directly via onboard sensing. Vision-based learning systems aim to eliminate the need for environment instrumentation by building an implicit understanding of the world based on raw pixels, but navigating the contact-rich high-dimensional search space from solely sparse visual reward signals significantly exacerbates the challenge of exploration. The applicability of such systems is thus typically restricted to simulated or heavily engineered environments since agent exploration in the real-world without the guidance of explicit state estimation and dense rewards can lead to unsafe behavior and safety faults that are catastrophic. In this study, we isolate the root causes behind these limitations to develop a system, called MoDem-V2, capable of learning contact-rich manipulation directly in the uninstrumented real world. Building on the latest algorithmic advancements in model-based reinforcement learning (MBRL), demo-bootstrapping, and effective exploration, MoDem-V2 can acquire contact-rich dexterous manipulation skills directly in the real world. We identify key ingredients for leveraging demonstrations in model learning while respecting real-world safety considerations -- exploration centering, agency handover, and actor-critic ensembles. We empirically demonstrate the contribution of these ingredients in four complex visuo-motor manipulation problems in both simulation and the real world. To the best of our knowledge, our work presents the first successful system for demonstration-augmented visual MBRL trained directly in the real world. Visit https://sites.google.com/view/modem-v2 for videos and more details.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. A. A. Rusu, M. Vecerík, T. Rothörl, N. M. O. Heess, R. Pascanu, and R. Hadsell, “Sim-to-real robot learning from pixels with progressive nets,” ArXiv, vol. abs/1610.04286, 2016.
  2. J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30, 2017.
  3. A. Kumar, Z. Fu, D. Pathak, and J. Malik, “Rma: Rapid motor adaptation for legged robots,” ArXiv, vol. abs/2107.04034, 2021.
  4. A. Handa, A. Allshire, V. Makoviychuk, A. Petrenko, R. Singh, J. Liu, D. Makoviichuk, K. V. Wyk, A. Zhurkevich, B. Sundaralingam, Y. S. Narang, J.-F. Lafleche, D. Fox, and G. State, “Dextreme: Transfer of agile in-hand manipulation from simulation to reality,” ArXiv, vol. abs/2210.13702, 2022.
  5. D. Ha and J. Schmidhuber, “Recurrent world models facilitate policy evolution,” in Advances in Neural Information Processing Systems 31.   Curran Associates, Inc., 2018, pp. 2451–2463.
  6. D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, and J. Davidson, “Learning latent dynamics for planning from pixels,” in International Conference on Machine Learning, 2019, pp. 2555–2565.
  7. N. Hansen, X. Wang, and H. Su, “Temporal difference learning for model predictive control,” arXiv preprint arXiv:2203.04955, 2022.
  8. R. M. Shah and V. Kumar, “Rrl: Resnet as representation for reinforcement learning,” in International Conference on Machine Learning.   PMLR, 2021, pp. 9465–9476.
  9. N. Hansen, Y. Lin, H. Su, X. Wang, V. Kumar, and A. Rajeswaran, “Modem: Accelerating visual model-based reinforcement learning with demonstrations,” arXiv preprint arXiv:2212.05698, 2022.
  10. C. G. Atkeson and S. Schaal, “Robot learning from demonstration,” in ICML, 1997.
  11. J.-B. Grill, F. Strub, F. Altch’e, C. Tallec, P. H. Richemond, E. Buchatskaya, C. Doersch, B. Á. Pires, Z. D. Guo, M. G. Azar, B. Piot, K. Kavukcuoglu, R. Munos, and M. Valko, “Bootstrap your own latent: A new approach to self-supervised learning,” ArXiv, vol. abs/2006.07733, 2020.
  12. Z. Huang, S. Zhou, B. Zhuang, and X. Zhou, “Learning to run with actor-critic ensemble,” arXiv preprint arXiv:1712.08987, 2017.
  13. S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” in International conference on machine learning.   PMLR, 2018, pp. 1587–1596.
  14. K. Chua, R. Calandra, R. McAllister, and S. Levine, “Deep reinforcement learning in a handful of trials using probabilistic dynamics models,” Advances in neural information processing systems, vol. 31, 2018.
  15. “Robohive – a unified framework for robot learning,” https://sites.google.com/view/robohive, 2020. [Online]. Available: https://sites.google.com/view/robohive
  16. A. Rajeswaran, V. Kumar, A. Gupta, G. Vezzani, J. Schulman, E. Todorov, and S. Levine, “Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations,” in Proceedings of Robotics: Science and Systems (RSS), 2018.
  17. A. Zhan, P. Zhao, L. Pinto, P. Abbeel, and M. Laskin, “A framework for efficient robotic manipulation,” ArXiv, vol. abs/2012.07975, 2020.
  18. R. Bhirangi, A. DeFranco, J. Adkins, C. Majidi, A. Gupta, T. Hellebrekers, and V. Kumar, “All the feels: A dexterous hand with large area sensing,” arXiv preprint arXiv:2210.15658, 2022.
  19. M. Ahn, H. Zhu, K. Hartikainen, H. Ponte, A. Gupta, S. Levine, and V. Kumar, “Robel: Robotics benchmarks for learning with low-cost robots,” in Conference on robot learning.   PMLR, 2020, pp. 1300–1313.
  20. F. Ebert, C. Finn, S. Dasari, A. Xie, A. X. Lee, and S. Levine, “Visual foresight: Model-based deep reinforcement learning for vision-based robotic control,” ArXiv, vol. abs/1812.00568, 2018.
  21. L. Kaiser, M. Babaeizadeh, P. Milos, B. Osinski, R. H. Campbell, K. Czechowski, D. Erhan, C. Finn, P. Kozakowski, S. Levine, R. Sepassi, G. Tucker, and H. Michalewski, “Model-based reinforcement learning for atari,” ArXiv, vol. abs/1903.00374, 2020.
  22. J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepel, T. P. Lillicrap, and D. Silver, “Mastering atari, go, chess and shogi by planning with a learned model,” Nature, vol. 588 7839, pp. 604–609, 2020.
  23. W. Ye, S. Liu, T. Kurutach, P. Abbeel, and Y. Gao, “Mastering atari games with limited data,” in NeurIPS, 2021.
  24. L. Brunke, M. Greeff, A. W. Hall, Z. Yuan, S. Zhou, J. Panerati, and A. P. Schoellig, “Safe learning in robotics: From learning-based control to safe reinforcement learning,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 5, pp. 411–444, 2022.
  25. Y. Ge, F. Zhu, X. Ling, and Q. Liu, “Safe q-learning method based on constrained markov decision processes,” IEEE Access, vol. 7, pp. 165 007–165 017, 2019.
  26. Y. Chow, O. Nachum, E. Duenez-Guzman, and M. Ghavamzadeh, “A lyapunov-based approach to safe reinforcement learning,” Advances in neural information processing systems, vol. 31, 2018.
  27. A. Wachi, Y. Sui, Y. Yue, and M. Ono, “Safe exploration and optimization of constrained mdps using gaussian processes,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018.
  28. J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS).   IEEE, 2017, pp. 23–30.
  29. G. D. Kontes, D. D. Scherer, T. Nisslbeck, J. Fischer, and C. Mutschler, “High-speed collision avoidance using deep reinforcement learning and domain randomization for autonomous vehicles,” in 2020 IEEE 23rd international conference on Intelligent Transportation Systems (ITSC).   IEEE, 2020, pp. 1–8.
  30. B. Mehta, M. Diaz, F. Golemo, C. J. Pal, and L. Paull, “Active domain randomization,” in Conference on Robot Learning.   PMLR, 2020, pp. 1162–1176.
  31. B. Yang, G. Habibi, P. Lancaster, B. Boots, and J. Smith, “Motivating physical activity via competitive human-robot interaction,” in Conference on Robot Learning.   PMLR, 2022, pp. 839–849.
  32. B. Yang, L. Zheng, L. J. Ratliff, B. Boots, and J. R. Smith, “Stackelberg games for learning emergent behaviors during competitive autocurricula,” arXiv preprint arXiv:2305.03735, 2023.
  33. J. Zhang, B. Cheung, C. Finn, S. Levine, and D. Jayaraman, “Cautious adaptation for reinforcement learning in safety-critical settings,” in International Conference on Machine Learning.   PMLR, 2020, pp. 11 055–11 065.
  34. B. Thananjeyan, A. Balakrishna, U. Rosolia, F. Li, R. McAllister, J. E. Gonzalez, S. Levine, F. Borrelli, and K. Goldberg, “Safety augmented value estimation from demonstrations (saved): Safe deep model-based rl for sparse cost robotic tasks,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 3612–3619, 2020.
  35. M. Henaff, A. Canziani, and Y. LeCun, “Model-predictive policy learning with uncertainty regularization for driving in dense traffic,” arXiv preprint arXiv:1901.02705, 2019.
  36. E. Jang, A. Irpan, M. Khansari, D. Kappler, F. Ebert, C. Lynch, S. Levine, and C. Finn, “Bc-z: Zero-shot task generalization with robotic imitation learning,” in Conference on Robot Learning, 2022.
  37. S. Nair, A. Rajeswaran, V. Kumar, C. Finn, and A. Gupta, “R3m: A universal visual representation for robot manipulation,” in Conference on Robot Learning, 2022.
  38. Y. Zhu, A. Joshi, P. Stone, and Y. Zhu, “Viola: Imitation learning for vision-based manipulation with object proposal priors,” ArXiv, vol. abs/2210.11339, 2022.
  39. A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, J. Hsu, J. Ibarz, B. Ichter, A. Irpan, T. Jackson, S. Jesmonth, N. J. Joshi, R. C. Julian, D. Kalashnikov, Y. Kuang, I. Leal, K.-H. Lee, S. Levine, Y. Lu, U. Malla, D. Manjunath, I. Mordatch, O. Nachum, C. Parada, J. Peralta, E. Perez, K. Pertsch, J. Quiambao, K. Rao, M. S. Ryoo, G. Salazar, P. R. Sanketi, K. Sayed, J. Singh, S. A. Sontakke, A. Stone, C. Tan, H. Tran, V. Vanhoucke, S. Vega, Q. H. Vuong, F. Xia, T. Xiao, P. Xu, S. Xu, T. Yu, and B. Zitkovich, “Rt-1: Robotics transformer for real-world control at scale,” ArXiv, vol. abs/2212.06817, 2022.
  40. L. Pinto and A. K. Gupta, “Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours,” 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 3406–3413, 2015.
  41. S. Levine, P. Pastor, A. Krizhevsky, and D. Quillen, “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection,” The International Journal of Robotics Research, vol. 37, pp. 421 – 436, 2016.
  42. H. Zhu, A. Gupta, A. Rajeswaran, S. Levine, and V. Kumar, “Dexterous manipulation with deep reinforcement learning: Efficient, general, and low-cost,” 2019 International Conference on Robotics and Automation (ICRA), pp. 3651–3657, 2018.
  43. P. Wu, A. Escontrela, D. Hafner, P. Abbeel, and K. Goldberg, “Daydreamer: World models for physical robot learning,” in Conference on Robot Learning.   PMLR, 2023, pp. 2226–2240.
  44. R. C. Julian, B. Swanson, G. S. Sukhatme, S. Levine, C. Finn, and K. Hausman, “Efficient adaptation for end-to-end vision-based robotic manipulation,” ArXiv, vol. abs/2004.10190, 2020.
  45. D. Kalashnikov, J. Varley, Y. Chebotar, B. Swanson, R. Jonschkowski, C. Finn, S. Levine, and K. Hausman, “Mt-opt: Continuous multi-task robotic reinforcement learning at scale,” ArXiv, vol. abs/2104.08212, 2021.
  46. A. Kumar, A. Singh, F. Ebert, Y. Yang, C. Finn, and S. Levine, “Pre-training for robots: Offline rl enables learning new tasks from a handful of trials,” ArXiv, vol. abs/2210.05178, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Patrick Lancaster (7 papers)
  2. Nicklas Hansen (22 papers)
  3. Aravind Rajeswaran (42 papers)
  4. Vikash Kumar (70 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.