ManyQuadrupeds: Learning a Single Locomotion Policy for Diverse Quadruped Robots (2310.10486v2)

Published 16 Oct 2023 in cs.RO, cs.AI, cs.LG, cs.SY, and eess.SY

Abstract: Learning a locomotion policy for quadruped robots has traditionally been constrained to a specific robot morphology, mass, and size. The learning process must usually be repeated for every new robot, where hyperparameters and reward function weights must be re-tuned to maximize performance for each new system. Alternatively, attempting to train a single policy to accommodate different robot sizes, while maintaining the same degrees of freedom (DoF) and morphology, requires either complex learning frameworks, or mass, inertia, and dimension randomization, which leads to prolonged training periods. In our study, we show that drawing inspiration from animal motor control allows us to effectively train a single locomotion policy capable of controlling a diverse range of quadruped robots. The robot differences encompass: a variable number of DoFs, (i.e. 12 or 16 joints), three distinct morphologies, a broad mass range spanning from 2 kg to 200 kg, and nominal standing heights ranging from 18 cm to 100 cm. Our policy modulates a representation of the Central Pattern Generator (CPG) in the spinal cord, effectively coordinating both frequencies and amplitudes of the CPG to produce rhythmic output (Rhythm Generation), which is then mapped to a Pattern Formation (PF) layer. Across different robots, the only varying component is the PF layer, which adjusts the scaling parameters for the stride height and length. Subsequently, we evaluate the sim-to-real transfer by testing the single policy on both the Unitree Go1 and A1 robots. Remarkably, we observe robust performance, even when adding a 15 kg load, equivalent to 125% of the A1 robot's nominal mass.

References (40)

Citations (19)

View on Semantic Scholar

Summary

The paper proposes a unified locomotion policy via CPGs and DRL to handle diverse quadruped morphologies.
It employs a constant-sized action and observation space with task-space modulation to simplify robot control.
Experiments on 16 platforms demonstrate robust trotting under load and rapid training efficiency within two hours.

Learning a Unified Locomotion Policy for Diverse Quadruped Robots

The paper "Learning a Single Locomotion Policy for Diverse Quadruped Robots" addresses a significant challenge in robotics: developing a generalizable locomotion policy applicable to quadruped robots with varied morphologies, sizes, and degrees of freedom (DoF). The research introduces a framework leveraging Central Pattern Generators (CPGs) and Deep Reinforcement Learning (DRL), enabling the training of a singular control policy adaptable to a broad range of quadruped robots.

Methodology

Central to this work is the use of a CPG-inspired model, integrated with DRL, to create a unified control policy. This approach is motivated by the biological principles observed in vertebrate locomotion systems, which utilize CPGs in the spinal cord to generate rhythmic motor patterns. In the proposed method, a Multi-Layer Perceptron (MLP) represents the higher control centers, coordinating the modulation of CPG dynamics to produce robust rhythmic outputs. The Rhythm Generation (RG) layer of the CPG is implemented using nonlinear phase oscillators, while the Pattern Formation (PF) layer maps these outputs into specific foot trajectories.

The paper introduces a constant-sized action and observation space, regardless of the robot's morphology or DoF, simplifying the policy's generalization across different robots. This is achieved by focusing on task-space modulation of foot trajectories through inverse kinematics, thereby bypassing the need for joint-specific data in the observation space.

Results

The policy was tested across 16 diverse robotic platforms, including commercial robots like Unitree A1 and Boston Dynamics Spot, as well as custom-designed robots. The experimental results indicate the policy's robustness. Notably, it was observed that the trained policy maintains stable trotting even under a 15 kg load, equivalent to 125% of the A1 robot's nominal mass.

A striking numerical result is the training efficiency demonstrated, where the policy was trained for 16 diverse quadruped robots in under two hours, leveraging GPU parallelization using Isaac Gym. Such computational efficiency illustrates the practicality of the approach for real-world applications where varied robotic platforms must be managed.

Implications and Future Directions

This research presents a significant step towards creating versatile robotic systems capable of operating across standard and dynamic environments without the need for robot-specific control policies. The implications of this work extend to various domains requiring adaptable robotic mobility, such as search and rescue missions, autonomous exploration, and industries where diverse robotic fleets are deployed.

Future research can build on this work by expanding the adaptability of the unified policy to include omni-directional and more complex locomotion tasks on uneven or unpredictable terrains. Additionally, incorporating more sophisticated sensory feedback mechanisms could enhance the policy's responsiveness to environmental changes, thus improving its applicability to real-world conditions.

Overall, the integration of biologically inspired frameworks with machine learning paradigms, as evidenced in this paper, continues to pave the way for more robust, adaptive, and efficient robotic systems.

PDF Markdown

Related Papers

GitHub

MANYQuadrupeds

Tweets

https://twitter.com/MiladShafieeA/status/1767173299497210277

YouTube

Show All Videos