DecAP: Decaying Action Priors for Accelerated Imitation Learning of Torque-Based Legged Locomotion Policies (2310.05714v3)
Abstract: Optimal Control for legged robots has gone through a paradigm shift from position-based to torque-based control, owing to the latter's compliant and robust nature. In parallel to this shift, the community has also turned to Deep Reinforcement Learning (DRL) as a promising approach to directly learn locomotion policies for complex real-life tasks. However, most end-to-end DRL approaches still operate in position space, mainly because learning in torque space is often sample-inefficient and does not consistently converge to natural gaits. To address these challenges, we propose a two-stage framework. In the first stage, we generate our own imitation data by training a position-based policy, eliminating the need for expert knowledge to design optimal controllers. The second stage incorporates decaying action priors, a novel method to enhance the exploration of torque-based policies aided by imitation rewards. We show that our approach consistently outperforms imitation learning alone and is robust to scaling these rewards from 0.1x to 10x. We further validate the benefits of torque control by comparing the robustness of a position-based policy to a position-assisted torque-based policy on a quadruped (Unitree Go1) without any domain randomization in the form of external disturbances during training.
- J. D. Carlo, P. M. Wensing, B. Katz, G. Bledt, and S. Kim, “Dynamic locomotion in the mit cheetah 3 through convex model-predictive control,” 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1–9, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:57754277
- E. Dantec, M. Naveau, P. Fernbach, N. Villa, G. Saurel, O. Stasse, M. Taix, and N. Mansard, “Whole-body model predictive control for biped locomotion on a torque-controlled humanoid robot,” in 2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids). IEEE, 2022, pp. 638–644.
- Y. Sun, W. Ubellacker, W.-L. Ma, X. Zhang, C. Wang, N. Csomay-Shanklin, M. Tomizuka, K. Sreenath, and A. Ames, “Online learning of unknown dynamics for model-based controllers in legged locomotion,” IEEE Robotics and Automation Letters, vol. 6, pp. 8442–8449, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:236783942
- N. Rudin, D. Hoeller, P. Reist, and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” in Conference on Robot Learning. PMLR, 2022, pp. 91–100.
- Y. Ding, A. Pandala, C. Li, Y.-H. Shin, and H. won Park, “Representation-free model predictive control for dynamic motions in quadrupeds,” IEEE Transactions on Robotics, vol. 37, pp. 1154–1171, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:229331611
- G. Margolis, “Walk these ways: Tuning robot control for generalization with multiplicity of behavior,” in Conference on Robot Learning, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:254192949
- J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun, and M. Hutter, “Learning agile and dynamic motor skills for legged robots,” Science Robotics, vol. 4, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:58031572
- J. Buchli, M. Kalakrishnan, M. N. Mistry, P. Pastor, and S. Schaal, “Compliant quadruped locomotion over rough terrain,” 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 814–820, 2009. [Online]. Available: https://api.semanticscholar.org/CorpusID:601852
- Y. Fuchioka, Z. Xie, and M. van de Panne, “Opt-mimic: Imitation of optimized trajectories for dynamic quadruped behaviors,” 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 5092–5098, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:252693391
- A. Shirwatkar, V. K. Kurva, D. Vinoda, A. Singh, A. Sagi, H. Lodha, B. G. Goswami, S. Sood, K. Nehete, and S. Kolathaya, “Force control for robust quadruped locomotion: A linear policy approach,” 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 5113–5119, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:259338001
- J. Wu, G. Xin, C. Qi, and Y. Xue, “Learning robust and agile legged locomotion using adversarial motion priors,” IEEE Robotics and Automation Letters, vol. 8, pp. 4975–4982, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:259363534
- Y. Kim, H. S. Oh, J. H. Lee, J. Choi, G. Ji, M. Jung, D. H. Youm, and J. Hwangbo, “Not only rewards but also constraints: Applications on legged robot locomotion,” ArXiv, vol. abs/2308.12517, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:261100652
- X. Peng, P. Abbeel, S. Levine, and M. Panne, “Deepmimic: Example-guided deep reinforcement learning of physics-based character skills,” ACM Transactions on Graphics, vol. 37, 04 2018.
- N. Kohl and P. Stone, “Policy gradient reinforcement learning for fast quadrupedal locomotion,” IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA ’04. 2004, vol. 3, pp. 2619–2624 Vol.3, 2004. [Online]. Available: https://api.semanticscholar.org/CorpusID:7013049
- T. Haarnoja, A. Zhou, S. Ha, J. Tan, G. Tucker, and S. Levine, “Learning to walk via deep reinforcement learning,” ArXiv, vol. abs/1812.11103, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:57189150
- J. Whitman, M. Travers, and H. Choset, “Learning modular robot control policies,” IEEE Transactions on Robotics, pp. 1–19, 2023.
- T. Degris, P. M. Pilarski, and R. S. Sutton, “Model-free reinforcement learning with continuous action in practice,” in 2012 American Control Conference (ACC), 2012, pp. 2177–2182.
- X. B. Peng, E. Coumans, T. Zhang, T.-W. E. Lee, J. Tan, and S. Levine, “Learning agile robotic locomotion skills by imitating animals,” ArXiv, vol. abs/2004.00784, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:214775281
- A. Rai, R. Antonova, S. Song, W. Martin, H. Geyer, and C. Atkeson, “Bayesian optimization using domain knowledge on the atrias biped,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 1771–1778.
- J. Z. Zhang, S. Yang, G. Yang, A. L. Bishop, S. Gurumurthy, D. Ramanan, and Z. Manchester, “Slomo: A general system for legged robot motion imitation from casual videos,” IEEE Robotics and Automation Letters, pp. 1–8, 2023.
- W. Li, Z. Zhou, and H. Cheng, “Dynamic locomotion of a quadruped robot with active spine via model predictive control,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 1185–1191.
- X. B. Peng and M. van de Panne, “Learning locomotion skills using deeprl: does the choice of action space matter?” Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation, 2016. [Online]. Available: https://api.semanticscholar.org/CorpusID:474202
- D. Kim, G. Berseth, M. Schwartz, and J. Park, “Torque-based deep reinforcement learning for task-and-robot agnostic learning on bipedal robots using sim-to-real transfer,” IEEE Robotics and Automation Letters, 2023.
- S. Chen, B. Zhang, M. W. Mueller, A. Rai, and K. Sreenath, “Learning torque control for quadrupedal locomotion,” ArXiv, vol. abs/2203.05194, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:247362659
- R. A. Brooks and M. J. Matarić, “Real robots, real learning problems,” 1993. [Online]. Available: https://api.semanticscholar.org/CorpusID:60260934
- V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, and G. State, “Isaac gym: High performance gpu-based physics simulation for robot learning,” ArXiv, vol. abs/2108.10470, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:237277983