Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Plan-Guided Reinforcement Learning for Whole-Body Manipulation (2310.12263v1)

Published 18 Oct 2023 in cs.RO

Abstract: Synthesizing complex whole-body manipulation behaviors has fundamental challenges due to the rapidly growing combinatorics inherent to contact interaction planning. While model-based methods have shown promising results in solving long-horizon manipulation tasks, they often work under strict assumptions, such as known model parameters, oracular observation of the environment state, and simplified dynamics, resulting in plans that cannot easily transfer to hardware. Learning-based approaches, such as imitation learning (IL) and reinforcement learning (RL), have been shown to be robust when operating over in-distribution states; however, they need heavy human supervision. Specifically, model-free RL requires a tedious reward-shaping process. IL methods, on the other hand, rely on human demonstrations that involve advanced teleoperation methods. In this work, we propose a plan-guided reinforcement learning (PGRL) framework to combine the advantages of model-based planning and reinforcement learning. Our method requires minimal human supervision because it relies on plans generated by model-based planners to guide the exploration in RL. In exchange, RL derives a more robust policy thanks to domain randomization. We test this approach on a whole-body manipulation task on Punyo, an upper-body humanoid robot with compliant, air-filled arm coverings, to pivot and lift a large box. Our preliminary results indicate that the proposed methodology is promising to address challenges that remain difficult for either model- or learning-based strategies alone.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. M. T. Mason, “Toward robotic manipulation,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 1, pp. 1–28, 2018.
  2. N. Chavan-Dafle and A. Rodriguez, “Sampling-based planning of in-hand manipulation with external pushes,” in Robotics Research: The 18th International Symposium ISRR.   Springer, 2020, pp. 523–539.
  3. T. Pang, H. Suh, L. Yang, and R. Tedrake, “Global planning for contact-rich manipulation via local smoothing of quasi-dynamic contact models,” arXiv preprint arXiv:2206.10787, 2022.
  4. X. Cheng, E. Huang, Y. Hou, and M. T. Mason, “Contact mode guided motion planning for quasidynamic dexterous manipulation in 3D,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 2730–2736.
  5. M. Zhang, D. K. Jha, A. U. Raghunathan, and K. Hauser, “Simultaneous trajectory optimization and contact selection for multi-modal manipulation planning,” arXiv preprint arXiv:2306.06465, 2023.
  6. R. Natarajan, G. L. Johnston, N. Simaan, M. Likhachev, and H. Choset, “Torque-limited manipulation planning through contact by interleaving graph search and trajectory optimization,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 8148–8154.
  7. A. Ö. Önol, R. Corcodel, P. Long, and T. Padır, “Tuning-free contact-implicit trajectory optimization,” in 2020 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 1183–1189.
  8. M. Wang, A. Ö. Önol, P. Long, and T. Padır, “Contact-implicit planning and control for non-prehensile manipulation using state-triggered constraints,” in The International Symposium of Robotics Research.   Springer, 2022, pp. 189–204.
  9. C. Chi, S. Feng, Y. Du, Z. Xu, E. Cousineau, B. Burchfiel, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” arXiv preprint arXiv:2303.04137, 2023.
  10. S. Haldar, J. Pari, A. Rai, and L. Pinto, “Teach a robot to fish: Versatile imitation from one minute of demonstrations,” arXiv preprint arXiv:2303.01497, 2023.
  11. M. Du, S. Nair, D. Sadigh, and C. Finn, “Behavior retrieval: Few-shot imitation learning by querying unlabeled datasets,” arXiv preprint arXiv:2304.08742, 2023.
  12. J. Barreiros, L. Tianshu, M. Chiaramonte, K. Jost, Y. Menguc, N. Colonnese, and P. Agarwal, “Hyfar: A textile soft actuator for haptic clothing interfaces.”   ACM, 2022.
  13. C. Rognon, S. Mintchev, F. Dell’Agnola, D. Atienza, and D. Floreano, “Flyjacket: An upper body soft exoskeleton for immersive drone control.”   IEEE, 2018, pp. 2362–2369.
  14. T. Chen, J. Xu, and P. Agrawal, “A system for general in-hand object re-orientation,” in Conference on Robot Learning.   PMLR, 2022, pp. 297–307.
  15. O. M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray et al., “Learning dexterous in-hand manipulation,” The International Journal of Robotics Research, vol. 39, no. 1, pp. 3–20, 2020.
  16. A. Nagabandi, K. Konolige, S. Levine, and V. Kumar, “Deep dynamics models for learning dexterous manipulation,” in Conference on Robot Learning.   PMLR, 2020, pp. 1101–1112.
  17. T. Chen, M. Tippur, S. Wu, V. Kumar, E. Adelson, and P. Agrawal, “Visual dexterity: In-hand dexterous manipulation from depth,” in ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023.
  18. J. EEßerer, N. Bach, C. Jestel, O. Urbann, and S. Kerner, “Guided reinforcement learning: A review and evaluation for efficient and effective real-world robotics,” IEEE Robotics & Automation Magazine, 2022.
  19. M. Vecerik, T. Hester, J. Scholz, F. Wang, O. Pietquin, B. Piot, N. Heess, T. Rothörl, T. Lampe, and M. Riedmiller, “Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards,” arXiv preprint arXiv:1707.08817, 2017.
  20. A. Rajeswaran, V. Kumar, A. Gupta, G. Vezzani, J. Schulman, E. Todorov, and S. Levine, “Learning complex dexterous manipulation with deep reinforcement learning and demonstrations,” arXiv preprint arXiv:1709.10087, 2017.
  21. X. B. Peng, P. Abbeel, S. Levine, and M. Van de Panne, “Deepmimic: Example-guided deep reinforcement learning of physics-based character skills,” ACM Transactions On Graphics (TOG), vol. 37, no. 4, pp. 1–14, 2018.
  22. A. Nair, B. McGrew, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Overcoming exploration in reinforcement learning with demonstrations,” in 2018 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2018, pp. 6292–6299.
  23. Y. Zhu, Z. Wang, J. Merel, A. Rusu, T. Erez, S. Cabi, S. Tunyasuvunakool, J. Kramár, R. Hadsell, N. de Freitas et al., “Reinforcement and imitation learning for diverse visuomotor skills,” arXiv preprint arXiv:1802.09564, 2018.
  24. H. Zhu, A. Gupta, A. Rajeswaran, S. Levine, and V. Kumar, “Dexterous manipulation with deep reinforcement learning: Efficient, general, and low-cost,” in 2019 International Conference on Robotics and Automation (ICRA).   IEEE, 2019, pp. 3651–3657.
  25. V. G. Goecks, G. M. Gremillion, V. J. Lawhern, J. Valasek, and N. R. Waytowich, “Integrating behavior cloning and reinforcement learning for improved performance in dense and sparse reward environments,” arXiv preprint arXiv:1910.04281, 2019.
  26. S. Christen, S. Stevšić, and O. Hilliges, “Demonstration-guided deep reinforcement learning of control policies for dexterous human-robot interaction,” in 2019 International Conference on Robotics and Automation (ICRA).   IEEE, 2019, pp. 2161–2167.
  27. X. B. Peng, E. Coumans, T. Zhang, T.-W. Lee, J. Tan, and S. Levine, “Learning agile robotic locomotion skills by imitating animals,” arXiv preprint arXiv:2004.00784, 2020.
  28. A. Nair, A. Gupta, M. Dalal, and S. Levine, “AWAC: Accelerating online reinforcement learning with offline datasets,” arXiv preprint arXiv:2006.09359, 2020.
  29. S. P. Arunachalam, S. Silwal, B. Evans, and L. Pinto, “Dexterous imitation made easy: A learning-based framework for efficient dexterous manipulation,” in 2023 ieee international conference on robotics and automation (icra).   IEEE, 2023, pp. 5954–5961.
  30. X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa, “Amp: Adversarial motion priors for stylized physics-based character control,” ACM Transactions on Graphics (ToG), vol. 40, no. 4, pp. 1–20, 2021.
  31. E. Vollenweider, M. Bjelonic, V. Klemm, N. Rudin, J. Lee, and M. Hutter, “Advanced skills through multiple adversarial motion priors in reinforcement learning,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 5120–5126.
  32. A. Escontrela, X. B. Peng, W. Yu, T. Zhang, A. Iscen, K. Goldberg, and P. Abbeel, “Adversarial motion priors make good substitutes for complex reward functions,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2022, pp. 25–32.
  33. C. Li, M. Vlastelica, S. Blaes, J. Frey, F. Grimminger, and G. Martius, “Learning agile skills via adversarial imitation of rough partial demonstrations,” in Conference on Robot Learning.   PMLR, 2023, pp. 342–352.
  34. C. Li, S. Blaes, P. Kolev, M. Vlastelica, J. Frey, and G. Martius, “Versatile skill control via self-supervised adversarial imitation of unlabeled mixed motions,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 2944–2950.
  35. J. Wu, G. Xin, C. Qi, and Y. Xue, “Learning robust and agile legged locomotion using adversarial motion priors,” IEEE Robotics and Automation Letters, 2023.
  36. M. Bogdanovic, M. Khadiv, and L. Righetti, “Model-free reinforcement learning for robust locomotion using demonstrations from trajectory optimization,” Frontiers in Robotics and AI, vol. 9, p. 854212, 2022.
  37. Y. Fuchioka, Z. Xie, and M. Van de Panne, “Opt-mimic: Imitation of optimized trajectories for dynamic quadruped behaviors,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 5092–5098.
  38. A. Miller, S. Fahmi, M. Chignoli, and S. Kim, “Reinforcement learning for legged robots: Motion imitation from model-based optimal control,” arXiv preprint arXiv:2305.10989, 2023.
  39. D. Kang, J. Cheng, M. Zamora, F. Zargarbashi, and S. Coros, “Rl+ model-based control: Using on-demand optimal control to learn versatile legged locomotion,” arXiv preprint arXiv:2305.17842, 2023.
  40. L. Wang, Y. Xiang, W. Yang, A. Mousavian, and D. Fox, “Goal-auxiliary actor-critic for 6d robotic grasping with point clouds,” in Conference on Robot Learning.   PMLR, 2022, pp. 70–80.
  41. S. Huang, Z. Wang, P. Li, B. Jia, T. Liu, Y. Zhu, W. Liang, and S.-C. Zhu, “Diffusion-based generation, optimization, and planning in 3d scenes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 16 750–16 761.
  42. A. Goncalves, N. Kuppuswamy, A. Beaulieu, A. Uttamchandani, K. M. Tsui, and A. Alspach, “Punyo-1: Soft tactile-sensing upper-body robot for large object manipulation and physical human interaction,” in 2022 IEEE 5th International Conference on Soft Robotics (RoboSoft).   IEEE, 2022, pp. 844–851.
  43. S. LaValle, “Rapidly-exploring random trees: A new tool for path planning,” Research Report 9811, 1998.
  44. N. Kuppuswamy, A. Alspach, A. Uttamchandani, S. Creasey, T. Ikeda, and R. Tedrake, “Soft-bubble grippers for robust and perceptive manipulation,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2020, pp. 9917–9924.
  45. R. Tedrake and the Drake Development Team, “Drake: Model-based design and verification for robotics,” 2019. [Online]. Available: https://drake.mit.edu
  46. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  47. D. Makoviichuk and V. Makoviychuk. rl-games: A high-performance framework for reinforcement learning. [Online]. Available: https://github.com/Denys88/rl_games
  48. IsaacGymEnvs. [Online]. Available: https://github.com/NVIDIA-Omniverse/IsaacGymEnvs
  49. V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, and G. State, “Isaac gym: High performance gpu-based physics simulation for robot learning,” 2021.
  50. J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS).   IEEE, 2017, pp. 23–30.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Mengchao Zhang (45 papers)
  2. Jose Barreiros (2 papers)
  3. Aykut Ozgun Onol (7 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.