ZSL-RPPO: Zero-Shot Learning for Quadrupedal Locomotion in Challenging Terrains using Recurrent Proximal Policy Optimization (2403.01928v1)
Abstract: We present ZSL-RPPO, an improved zero-shot learning architecture that overcomes the limitations of teacher-student neural networks and enables generating robust, reliable, and versatile locomotion for quadrupedal robots in challenging terrains. We propose a new algorithm RPPO (Recurrent Proximal Policy Optimization) that directly trains recurrent neural network in partially observable environments and results in more robust training using domain randomization. Our locomotion controller supports extensive perturbation across simulation-to-reality transfer for both intrinsic and extrinsic physical parameters without further fine-tuning. This can avoid the significant decline of student's performance during simulation-to-reality transfer and therefore enhance the robustness and generalization of the locomotion controller. We deployed our controller on the Unitree A1 and Aliengo robots in real environment and exteroceptive perception is provided by either a solid-state Lidar or a depth camera. Our locomotion controller was tested in various challenging terrains like slippery surfaces, Grassy Terrain, and stairs. Our experiment results and comparison show that our approach significantly outperforms the state-of-the-art.
- A. Kumar, Z. Fu, D. Pathak, and J. Malik, “Rma: Rapid motor adaptation for legged robots,” 2021.
- T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter, “Learning robust perceptive locomotion for quadrupedal robots in the wild,” Science Robotics, vol. 7, no. 62, p. abk2822, 2022.
- J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun, and M. Hutter, “Learning agile and dynamic motor skills for legged robots,” Science Robotics, vol. 4, no. 26, p. eaau5872, 2019.
- F. Bjelonic, J. Lee, P. Arm, D. Sako, D. Tateo, J. Peters, and M. Hutter, “Learning-based design and control for quadrupedal robots with parallel-elastic actuators,” IEEE Robotics and Automation Letters, 2023.
- R. Calandra, A. Seyfarth, J. Peters, and M. P. Deisenroth, “Bayesian optimization for learning gaits under uncertainty: An experimental comparison on a dynamic bipedal walker,” Annals of Mathematics and Artificial Intelligence, vol. 76, pp. 5–23, 2016.
- R. J. Full and D. E. Koditschek, “Templates and anchors: neuromechanical hypotheses of legged locomotion on land,” Journal of experimental biology, vol. 202, no. 23, pp. 3325–3332, 1999.
- M. Reibert, “Hybrid position/force control of manipulators,” ASME, J. of Dynamic Systems, Measurement, and Control, vol. 103, pp. 2–12, 1981.
- C. D. Bellicoso, M. Bjelonic, L. Wellhausen, K. Holtmann, F. Günther, M. Tranzatto, P. Fankhauser, and M. Hutter, “Advances in real-world applications for legged robots,” Journal of Field Robotics, vol. 35, no. 8, pp. 1311–1326, 2018.
- J. Frey, M. Mattamala, N. Chebrolu, C. Cadena, M. Fallon, and M. Hutter, “Fast traversability estimation for wild visual navigation,” arXiv:2305.08510, 2023.
- Y. Yu, “Towards sample efficient reinforcement learning.” in IJCAI, 2018, pp. 5739–5743.
- D. Yarats, A. Zhang, I. Kostrikov, B. Amos, J. Pineau, and R. Fergus, “Improving sample efficiency in model-free reinforcement learning from images,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2021, pp. 10 674–10 681.
- L. Smith, I. Kostrikov, and S. Levine, “A walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning,” arXiv:2208.07860, 2022.
- A. I. Cowen-Rivers, D. Palenicek, V. Moens, M. A. Abdullah, A. Sootla, J. Wang, and H. Bou-Ammar, “Samba: Safe model-based & active reinforcement learning,” Machine Learning, vol. 111, no. 1, pp. 173–203, 2022.
- N. Rudin, D. Hoeller, P. Reist, and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” in Conference on Robot Learning. PMLR, 2022, pp. 91–100.
- A. S. Polydoros and L. Nalpantidis, “Survey of model-based reinforcement learning: Applications on robotics,” Journal of Intelligent & Robotic Systems, vol. 86, no. 2, pp. 153–173, 2017.
- J. Xu, T. Du, M. Foshey, B. Li, B. Zhu, A. Schulz, and W. Matusik, “Learning to fly: computational controller design for hybrid uavs with reinforcement learning,” ACM Transactions on Graphics (TOG), vol. 38, no. 4, pp. 1–12, 2019.
- H. I. Ugurlu, S. Kalkan, and A. Saranli, “Reinforcement learning versus conventional control for controlling a planar bi-rotor platform with tail appendage,” Journal of Intelligent & Robotic Systems, vol. 102, pp. 1–17, 2021.
- H. Ju, R. Juan, R. Gomez, K. Nakamura, and G. Li, “Transferring policy of deep reinforcement learning from simulation to reality for robotics,” Nature Machine Intelligence, pp. 1–11, 2022.
- S. Höfer, K. Bekris, A. Handa, J. C. Gamboa, F. Golemo, M. Mozifian, C. Atkeson, D. Fox, K. Goldberg, J. Leonard, et al., “Perspectives on sim2real transfer for robotics: A summary of the r: Ss 2020 workshop,” arXiv:2012.03806, 2020.
- S. Höfer, K. Bekris, A. Handa, J. C. Gamboa, M. Mozifian, F. Golemo, C. Atkeson, D. Fox, K. Goldberg, J. Leonard, et al., “Sim2real in robotics and automation: Applications and challenges,” IEEE transactions on automation science and engineering, vol. 18, no. 2, pp. 398–400, 2021.
- F. Muratore, T. Gruner, F. Wiese, B. Belousov, M. Gienger, and J. Peters, “Neural posterior domain randomization,” in Conference on Robot Learning. PMLR, 2022, pp. 1532–1542.
- J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2017, pp. 23–30.
- B. Mehta, M. Diaz, F. Golemo, C. J. Pal, and L. Paull, “Active domain randomization,” in Conference on Robot Learning. PMLR, 2020, pp. 1162–1176.
- Z. Li, X. Cheng, X. B. Peng, P. Abbeel, S. Levine, G. Berseth, and K. Sreenath, “Reinforcement learning for robust parameterized locomotion control of bipedal robots,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 2811–2817.
- C. Tessler, Y. Efroni, and S. Mannor, “Action robust reinforcement learning and applications in continuous control,” in International Conference on Machine Learning. PMLR, 2019, pp. 6215–6224.
- J. Moos, K. Hansel, H. Abdulsamad, S. Stark, D. Clever, and J. Peters, “Robust reinforcement learning: A review of foundations and recent advances,” Machine Learning and Knowledge Extraction, vol. 4, no. 1, pp. 276–315, 2022.
- E. Smirnova, E. Dohmatob, and J. Mary, “Distributionally robust reinforcement learning,” arXiv:1902.08708, 2019.
- M. A. Abdullah, H. Ren, H. B. Ammar, V. Milenkovic, R. Luo, M. Zhang, and J. Wang, “Wasserstein robust reinforcement learning,” arXiv:1907.13196, 2019.
- L. Hou, L. Pang, X. Hong, Y. Lan, Z. Ma, and D. Yin, “Robust reinforcement learning with wasserstein constraint,” arXiv:2006.00945, 2020.
- J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter, “Learning quadrupedal locomotion over challenging terrain,” Science Robotics, vol. 5, no. 47, p. eabc5986, 2020.
- A. Agarwal, A. Kumar, J. Malik, and D. Pathak, “Legged locomotion in challenging terrains using egocentric vision,” in Conference on Robot Learning. PMLR, 2023, pp. 403–415.
- K. J. Åström, “Optimal control of markov processes with incomplete state information,” Journal of Mathematical Analysis and Applications, vol. 10, no. 1, pp. 174–205, 1965.
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv:1707.06347, 2017.
- J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,” in International Conference on Learning Representations (ICLR), 2016.
- S. Bai, J. Z. Kolter, and V. Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” arXiv:1803.01271, 2018.
- A. Iscen, K. Caluwaerts, J. Tan, T. Zhang, E. Coumans, V. Sindhwani, and V. Vanhoucke, “Policies modulating trajectory generators,” in Conference on Robot Learning (CoRL), 2018, pp. 916–926.
- N. Rudin, D. Hoeller, P. Reist, and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” in Conference on Robot Learning (CoRL), 2022, pp. 91–100.