ManyQuadrupeds: Learning a Single Locomotion Policy for Diverse Quadruped Robots (2310.10486v2)
Abstract: Learning a locomotion policy for quadruped robots has traditionally been constrained to a specific robot morphology, mass, and size. The learning process must usually be repeated for every new robot, where hyperparameters and reward function weights must be re-tuned to maximize performance for each new system. Alternatively, attempting to train a single policy to accommodate different robot sizes, while maintaining the same degrees of freedom (DoF) and morphology, requires either complex learning frameworks, or mass, inertia, and dimension randomization, which leads to prolonged training periods. In our study, we show that drawing inspiration from animal motor control allows us to effectively train a single locomotion policy capable of controlling a diverse range of quadruped robots. The robot differences encompass: a variable number of DoFs, (i.e. 12 or 16 joints), three distinct morphologies, a broad mass range spanning from 2 kg to 200 kg, and nominal standing heights ranging from 18 cm to 100 cm. Our policy modulates a representation of the Central Pattern Generator (CPG) in the spinal cord, effectively coordinating both frequencies and amplitudes of the CPG to produce rhythmic output (Rhythm Generation), which is then mapped to a Pattern Formation (PF) layer. Across different robots, the only varying component is the PF layer, which adjusts the scaling parameters for the stride height and length. Subsequently, we evaluate the sim-to-real transfer by testing the single policy on both the Unitree Go1 and A1 robots. Remarkably, we observe robust performance, even when adding a 15 kg load, equivalent to 125% of the A1 robot's nominal mass.
- S. Grillner and A. El Manira, “Current principles of motor control, with special reference to vertebrate locomotion,” Physiological reviews, vol. 100, no. 1, pp. 271–320, 2020.
- S. Grillner, “Evolution: vertebrate limb control over 420 million years,” Current Biology, vol. 28, no. 4, pp. R162–R164, 2018.
- M. Ajallooeian, S. Pouya, A. Sproewitz, and A. J. Ijspeert, “Central pattern generators augmented with virtual model control for quadruped rough terrain locomotion,” in 2013 IEEE International Conference on Robotics and Automation, 2013, pp. 3321–3328.
- S. Aoi, P. Manoonpong, Y. Ambe, F. Matsuno, and F. Wörgötter, “Adaptive control strategies for interlimb coordination in legged robots: a review,” Frontiers in neurorobotics, vol. 11, p. 39, 2017.
- H. Kimura, Y. Fukuoka, and A. H. Cohen, “Adaptive dynamic walking of a quadruped robot on natural ground based on biological concepts,” The International Journal of Robotics Research, vol. 26, no. 5, pp. 475–490, 2007.
- G. Bellegarda, M. Shafiee, M. E. Özberk, and A. Ijspeert, “Quadruped-Frog: Rapid online optimization of continuous quadruped jumping,” in 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024.
- A. J. Ijspeert, A. Crespi, D. Ryczko, and J.-M. Cabelguen, “From swimming to walking with a salamander robot driven by a spinal cord model,” Science, vol. 315, no. 5817, pp. 1416–1420, 2007.
- R. Thandiackal, K. Melo, L. Paez, J. Herault, T. Kano, K. Akiyama, F. Boyer, D. Ryczko, A. Ishiguro, and A. J. Ijspeert, “Emergence of robust self-organized undulatory swimming based on local hydrodynamic force sensing,” Science Robotics, vol. 6, no. 57, 2021.
- A. J. Ijspeert, “Central pattern generators for locomotion control in animals and robots: A review,” Neural Networks, vol. 21, no. 4, pp. 642–653, 2008, robotics and Neuroscience.
- X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne, “Deepmimic: Example-guided deep reinforcement learning of physics-based character skills,” ACM Transactions on Graphics (TOG), vol. 37, no. 4, pp. 1–14, 2018.
- A. Iscen, K. Caluwaerts, J. Tan, T. Zhang, E. Coumans, V. Sindhwani, and V. Vanhoucke, “Policies modulating trajectory generators,” in Conference on Robot Learning. PMLR, 2018, pp. 916–926.
- J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun, and M. Hutter, “Learning agile and dynamic motor skills for legged robots,” Science Robotics, vol. 4, no. 26, 2019.
- J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter, “Learning quadrupedal locomotion over challenging terrain,” Science Robotics, vol. 5, no. 47, 2020.
- T. Miki, J. Lee, J. Hwanbo, L. Wellhausen, V. Koltun, and M. Hutter, “Learning robust perceptive locomotion for quadrupedal robots in the wild,” Science Robotics, 2022.
- J. Siekmann, K. Green, J. Warila, A. Fern, and J. Hurst, “Blind bipedal stair traversal via sim-to-real reinforcement learning,” arXiv preprint arXiv:2105.08328, 2021.
- X. B. Peng, E. Coumans, T. Zhang, T.-W. Lee, J. Tan, and S. Levine, “Learning Agile Robotic Locomotion Skills by Imitating Animals,” in Proceedings of Robotics: Science and Systems, Corvalis, Oregon, USA, July 2020.
- A. Kumar, Z. Fu, D. Pathak, and J. Malik, “Rma: Rapid motor adaptation for legged robots,” arXiv preprint arXiv:2107.04034, 2021.
- G. Ji, J. Mun, H. Kim, and J. Hwangbo, “Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4630–4637, 2022.
- G. B. Margolis, G. Yang, K. Paigwar, T. Chen, and P. Agrawal, “Rapid locomotion via reinforcement learning,” arXiv preprint arXiv:2205.02824, 2022.
- G. Bellegarda, Y. Chen, Z. Liu, and Q. Nguyen, “Robust high-speed running for quadruped robots via deep reinforcement learning,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 10 364–10 370.
- Y. Yang, T. Zhang, E. Coumans, J. Tan, and B. Boots, “Fast and efficient locomotion via learned gait transitions,” in Conference on Robot Learning. PMLR, 2022, pp. 773–783.
- Y. Shao, Y. Jin, X. Liu, W. He, H. Wang, and W. Yang, “Learning free gait transition for quadruped robots via phase-guided controller,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 1230–1237, 2022.
- W. Yu, C. Yang, C. McGreavy, E. Triantafyllidis, G. Bellegarda, M. Shafiee, A. J. Ijspeert, and Z. Li, “Identifying important sensory feedback for learning locomotion skills,” Nature Machine Intelligence, vol. 5, no. 8, pp. 919–932, 2023.
- A. S. Chiappa, A. Marin Vargas, and A. Mathis, “Dmap: a distributed morphological attention policy for learning to locomote with a changing body,” Advances in Neural Information Processing Systems, vol. 35, pp. 37 214–37 227, 2022.
- W. Huang, I. Mordatch, and D. Pathak, “One policy to control them all: Shared modular policies for agent-agnostic control,” in International Conference on Machine Learning. PMLR, 2020, pp. 4455–4464.
- V. Kurin, M. Igl, T. Rocktäschel, W. Boehmer, and S. Whiteson, “My body is a cage: the role of morphology in graph-based incompatible control,” arXiv preprint arXiv:2010.01856, 2020.
- B. Trabucco, M. Phielipp, and G. Berseth, “Anymorph: Learning transferable polices by inferring agent morphology,” in International Conference on Machine Learning. PMLR, 2022, pp. 21 677–21 691.
- J. Whitman, M. Travers, and H. Choset, “Learning modular robot control policies,” IEEE Transactions on Robotics, 2023.
- G. Feng, H. Zhang, Z. Li, X. B. Peng, B. Basireddy, L. Yue, Z. Song, L. Yang, Y. Liu, K. Sreenath, et al., “Genloco: Generalized locomotion controllers for quadrupedal robots,” in Conference on Robot Learning. PMLR, 2023, pp. 1893–1903.
- G. Bellegarda and A. Ijspeert, “CPG-RL: Learning central pattern generators for quadruped locomotion,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 12 547–12 554, 2022.
- M. Shafiee, G. Bellegarda, and A. Ijspeert, “Puppeteer and marionette: Learning anticipatory quadrupedal locomotion based on interactions of a central pattern generator and supraspinal drive,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 1112–1119.
- M. Shafiee, G. Bellegarda, and A. Ijspeert, “Deeptransition: Viability leads to the emergence of gait transitions in learning anticipatory quadrupedal locomotion skills,” arXiv preprint arXiv:2306.07419, 2023.
- G. Bellegarda, M. Shafiee, and A. Ijspeert, “Visual CPG-RL: Learning central pattern generators for visually-guided quadruped locomotion,” in 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024.
- V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, et al., “Isaac gym: High performance gpu-based physics simulation for robot learning,” arXiv preprint arXiv:2108.10470, 2021.
- M. Sombolestan, Y. Chen, and Q. Nguyen, “Adaptive force-based control for legged robots,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021, pp. 7440–7447.
- I. A. Rybak, N. A. Shevtsova, M. Lafreniere-Roula, and D. A. McCrea, “Modelling spinal circuitry involved in locomotor pattern generation: insights from deletions during fictive locomotion,” The Journal of physiology, vol. 577, no. 2, pp. 617–639, 2006.
- A. Fukuhara, D. Owaki, T. Kano, R. Kobayashi, and A. Ishiguro, “Spontaneous gait transition to high-speed galloping by reconciliation between body support and propulsion,” Advanced robotics, vol. 32, no. 15, pp. 794–808, 2018.
- A. Spröwitz, A. Tuleu, M. Vespignani, M. Ajallooeian, E. Badri, and A. J. Ijspeert, “Towards dynamic trot gait locomotion: Design, control, and experiments with cheetah-cub, a compliant quadruped robot,” The International Journal of Robotics Research, vol. 32, no. 8, pp. 932–950, 2013.
- N. Rudin, D. Hoeller, P. Reist, and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” in Proceedings of the 5th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, A. Faust, D. Hsu, and G. Neumann, Eds., vol. 164. PMLR, 08–11 Nov 2022, pp. 91–100. [Online]. Available: https://proceedings.mlr.press/v164/rudin22a.html
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” CoRR, vol. abs/1707.06347, 2017.