Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ManyQuadrupeds: Learning a Single Locomotion Policy for Diverse Quadruped Robots (2310.10486v2)

Published 16 Oct 2023 in cs.RO, cs.AI, cs.LG, cs.SY, and eess.SY

Abstract: Learning a locomotion policy for quadruped robots has traditionally been constrained to a specific robot morphology, mass, and size. The learning process must usually be repeated for every new robot, where hyperparameters and reward function weights must be re-tuned to maximize performance for each new system. Alternatively, attempting to train a single policy to accommodate different robot sizes, while maintaining the same degrees of freedom (DoF) and morphology, requires either complex learning frameworks, or mass, inertia, and dimension randomization, which leads to prolonged training periods. In our study, we show that drawing inspiration from animal motor control allows us to effectively train a single locomotion policy capable of controlling a diverse range of quadruped robots. The robot differences encompass: a variable number of DoFs, (i.e. 12 or 16 joints), three distinct morphologies, a broad mass range spanning from 2 kg to 200 kg, and nominal standing heights ranging from 18 cm to 100 cm. Our policy modulates a representation of the Central Pattern Generator (CPG) in the spinal cord, effectively coordinating both frequencies and amplitudes of the CPG to produce rhythmic output (Rhythm Generation), which is then mapped to a Pattern Formation (PF) layer. Across different robots, the only varying component is the PF layer, which adjusts the scaling parameters for the stride height and length. Subsequently, we evaluate the sim-to-real transfer by testing the single policy on both the Unitree Go1 and A1 robots. Remarkably, we observe robust performance, even when adding a 15 kg load, equivalent to 125% of the A1 robot's nominal mass.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. S. Grillner and A. El Manira, “Current principles of motor control, with special reference to vertebrate locomotion,” Physiological reviews, vol. 100, no. 1, pp. 271–320, 2020.
  2. S. Grillner, “Evolution: vertebrate limb control over 420 million years,” Current Biology, vol. 28, no. 4, pp. R162–R164, 2018.
  3. M. Ajallooeian, S. Pouya, A. Sproewitz, and A. J. Ijspeert, “Central pattern generators augmented with virtual model control for quadruped rough terrain locomotion,” in 2013 IEEE International Conference on Robotics and Automation, 2013, pp. 3321–3328.
  4. S. Aoi, P. Manoonpong, Y. Ambe, F. Matsuno, and F. Wörgötter, “Adaptive control strategies for interlimb coordination in legged robots: a review,” Frontiers in neurorobotics, vol. 11, p. 39, 2017.
  5. H. Kimura, Y. Fukuoka, and A. H. Cohen, “Adaptive dynamic walking of a quadruped robot on natural ground based on biological concepts,” The International Journal of Robotics Research, vol. 26, no. 5, pp. 475–490, 2007.
  6. G. Bellegarda, M. Shafiee, M. E. Özberk, and A. Ijspeert, “Quadruped-Frog: Rapid online optimization of continuous quadruped jumping,” in 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024.
  7. A. J. Ijspeert, A. Crespi, D. Ryczko, and J.-M. Cabelguen, “From swimming to walking with a salamander robot driven by a spinal cord model,” Science, vol. 315, no. 5817, pp. 1416–1420, 2007.
  8. R. Thandiackal, K. Melo, L. Paez, J. Herault, T. Kano, K. Akiyama, F. Boyer, D. Ryczko, A. Ishiguro, and A. J. Ijspeert, “Emergence of robust self-organized undulatory swimming based on local hydrodynamic force sensing,” Science Robotics, vol. 6, no. 57, 2021.
  9. A. J. Ijspeert, “Central pattern generators for locomotion control in animals and robots: A review,” Neural Networks, vol. 21, no. 4, pp. 642–653, 2008, robotics and Neuroscience.
  10. X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne, “Deepmimic: Example-guided deep reinforcement learning of physics-based character skills,” ACM Transactions on Graphics (TOG), vol. 37, no. 4, pp. 1–14, 2018.
  11. A. Iscen, K. Caluwaerts, J. Tan, T. Zhang, E. Coumans, V. Sindhwani, and V. Vanhoucke, “Policies modulating trajectory generators,” in Conference on Robot Learning.   PMLR, 2018, pp. 916–926.
  12. J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun, and M. Hutter, “Learning agile and dynamic motor skills for legged robots,” Science Robotics, vol. 4, no. 26, 2019.
  13. J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter, “Learning quadrupedal locomotion over challenging terrain,” Science Robotics, vol. 5, no. 47, 2020.
  14. T. Miki, J. Lee, J. Hwanbo, L. Wellhausen, V. Koltun, and M. Hutter, “Learning robust perceptive locomotion for quadrupedal robots in the wild,” Science Robotics, 2022.
  15. J. Siekmann, K. Green, J. Warila, A. Fern, and J. Hurst, “Blind bipedal stair traversal via sim-to-real reinforcement learning,” arXiv preprint arXiv:2105.08328, 2021.
  16. X. B. Peng, E. Coumans, T. Zhang, T.-W. Lee, J. Tan, and S. Levine, “Learning Agile Robotic Locomotion Skills by Imitating Animals,” in Proceedings of Robotics: Science and Systems, Corvalis, Oregon, USA, July 2020.
  17. A. Kumar, Z. Fu, D. Pathak, and J. Malik, “Rma: Rapid motor adaptation for legged robots,” arXiv preprint arXiv:2107.04034, 2021.
  18. G. Ji, J. Mun, H. Kim, and J. Hwangbo, “Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4630–4637, 2022.
  19. G. B. Margolis, G. Yang, K. Paigwar, T. Chen, and P. Agrawal, “Rapid locomotion via reinforcement learning,” arXiv preprint arXiv:2205.02824, 2022.
  20. G. Bellegarda, Y. Chen, Z. Liu, and Q. Nguyen, “Robust high-speed running for quadruped robots via deep reinforcement learning,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 10 364–10 370.
  21. Y. Yang, T. Zhang, E. Coumans, J. Tan, and B. Boots, “Fast and efficient locomotion via learned gait transitions,” in Conference on Robot Learning.   PMLR, 2022, pp. 773–783.
  22. Y. Shao, Y. Jin, X. Liu, W. He, H. Wang, and W. Yang, “Learning free gait transition for quadruped robots via phase-guided controller,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 1230–1237, 2022.
  23. W. Yu, C. Yang, C. McGreavy, E. Triantafyllidis, G. Bellegarda, M. Shafiee, A. J. Ijspeert, and Z. Li, “Identifying important sensory feedback for learning locomotion skills,” Nature Machine Intelligence, vol. 5, no. 8, pp. 919–932, 2023.
  24. A. S. Chiappa, A. Marin Vargas, and A. Mathis, “Dmap: a distributed morphological attention policy for learning to locomote with a changing body,” Advances in Neural Information Processing Systems, vol. 35, pp. 37 214–37 227, 2022.
  25. W. Huang, I. Mordatch, and D. Pathak, “One policy to control them all: Shared modular policies for agent-agnostic control,” in International Conference on Machine Learning.   PMLR, 2020, pp. 4455–4464.
  26. V. Kurin, M. Igl, T. Rocktäschel, W. Boehmer, and S. Whiteson, “My body is a cage: the role of morphology in graph-based incompatible control,” arXiv preprint arXiv:2010.01856, 2020.
  27. B. Trabucco, M. Phielipp, and G. Berseth, “Anymorph: Learning transferable polices by inferring agent morphology,” in International Conference on Machine Learning.   PMLR, 2022, pp. 21 677–21 691.
  28. J. Whitman, M. Travers, and H. Choset, “Learning modular robot control policies,” IEEE Transactions on Robotics, 2023.
  29. G. Feng, H. Zhang, Z. Li, X. B. Peng, B. Basireddy, L. Yue, Z. Song, L. Yang, Y. Liu, K. Sreenath, et al., “Genloco: Generalized locomotion controllers for quadrupedal robots,” in Conference on Robot Learning.   PMLR, 2023, pp. 1893–1903.
  30. G. Bellegarda and A. Ijspeert, “CPG-RL: Learning central pattern generators for quadruped locomotion,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 12 547–12 554, 2022.
  31. M. Shafiee, G. Bellegarda, and A. Ijspeert, “Puppeteer and marionette: Learning anticipatory quadrupedal locomotion based on interactions of a central pattern generator and supraspinal drive,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 1112–1119.
  32. M. Shafiee, G. Bellegarda, and A. Ijspeert, “Deeptransition: Viability leads to the emergence of gait transitions in learning anticipatory quadrupedal locomotion skills,” arXiv preprint arXiv:2306.07419, 2023.
  33. G. Bellegarda, M. Shafiee, and A. Ijspeert, “Visual CPG-RL: Learning central pattern generators for visually-guided quadruped locomotion,” in 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024.
  34. V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, et al., “Isaac gym: High performance gpu-based physics simulation for robot learning,” arXiv preprint arXiv:2108.10470, 2021.
  35. M. Sombolestan, Y. Chen, and Q. Nguyen, “Adaptive force-based control for legged robots,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2021, pp. 7440–7447.
  36. I. A. Rybak, N. A. Shevtsova, M. Lafreniere-Roula, and D. A. McCrea, “Modelling spinal circuitry involved in locomotor pattern generation: insights from deletions during fictive locomotion,” The Journal of physiology, vol. 577, no. 2, pp. 617–639, 2006.
  37. A. Fukuhara, D. Owaki, T. Kano, R. Kobayashi, and A. Ishiguro, “Spontaneous gait transition to high-speed galloping by reconciliation between body support and propulsion,” Advanced robotics, vol. 32, no. 15, pp. 794–808, 2018.
  38. A. Spröwitz, A. Tuleu, M. Vespignani, M. Ajallooeian, E. Badri, and A. J. Ijspeert, “Towards dynamic trot gait locomotion: Design, control, and experiments with cheetah-cub, a compliant quadruped robot,” The International Journal of Robotics Research, vol. 32, no. 8, pp. 932–950, 2013.
  39. N. Rudin, D. Hoeller, P. Reist, and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” in Proceedings of the 5th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, A. Faust, D. Hsu, and G. Neumann, Eds., vol. 164.   PMLR, 08–11 Nov 2022, pp. 91–100. [Online]. Available: https://proceedings.mlr.press/v164/rudin22a.html
  40. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” CoRR, vol. abs/1707.06347, 2017.
Citations (19)

Summary

  • The paper proposes a unified locomotion policy via CPGs and DRL to handle diverse quadruped morphologies.
  • It employs a constant-sized action and observation space with task-space modulation to simplify robot control.
  • Experiments on 16 platforms demonstrate robust trotting under load and rapid training efficiency within two hours.

Learning a Unified Locomotion Policy for Diverse Quadruped Robots

The paper "Learning a Single Locomotion Policy for Diverse Quadruped Robots" addresses a significant challenge in robotics: developing a generalizable locomotion policy applicable to quadruped robots with varied morphologies, sizes, and degrees of freedom (DoF). The research introduces a framework leveraging Central Pattern Generators (CPGs) and Deep Reinforcement Learning (DRL), enabling the training of a singular control policy adaptable to a broad range of quadruped robots.

Methodology

Central to this work is the use of a CPG-inspired model, integrated with DRL, to create a unified control policy. This approach is motivated by the biological principles observed in vertebrate locomotion systems, which utilize CPGs in the spinal cord to generate rhythmic motor patterns. In the proposed method, a Multi-Layer Perceptron (MLP) represents the higher control centers, coordinating the modulation of CPG dynamics to produce robust rhythmic outputs. The Rhythm Generation (RG) layer of the CPG is implemented using nonlinear phase oscillators, while the Pattern Formation (PF) layer maps these outputs into specific foot trajectories.

The paper introduces a constant-sized action and observation space, regardless of the robot's morphology or DoF, simplifying the policy's generalization across different robots. This is achieved by focusing on task-space modulation of foot trajectories through inverse kinematics, thereby bypassing the need for joint-specific data in the observation space.

Results

The policy was tested across 16 diverse robotic platforms, including commercial robots like Unitree A1 and Boston Dynamics Spot, as well as custom-designed robots. The experimental results indicate the policy's robustness. Notably, it was observed that the trained policy maintains stable trotting even under a 15 kg load, equivalent to 125% of the A1 robot's nominal mass.

A striking numerical result is the training efficiency demonstrated, where the policy was trained for 16 diverse quadruped robots in under two hours, leveraging GPU parallelization using Isaac Gym. Such computational efficiency illustrates the practicality of the approach for real-world applications where varied robotic platforms must be managed.

Implications and Future Directions

This research presents a significant step towards creating versatile robotic systems capable of operating across standard and dynamic environments without the need for robot-specific control policies. The implications of this work extend to various domains requiring adaptable robotic mobility, such as search and rescue missions, autonomous exploration, and industries where diverse robotic fleets are deployed.

Future research can build on this work by expanding the adaptability of the unified policy to include omni-directional and more complex locomotion tasks on uneven or unpredictable terrains. Additionally, incorporating more sophisticated sensory feedback mechanisms could enhance the policy's responsiveness to environmental changes, thus improving its applicability to real-world conditions.

Overall, the integration of biologically inspired frameworks with machine learning paradigms, as evidenced in this paper, continues to pave the way for more robust, adaptive, and efficient robotic systems.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com