CPG-RL: Learning Central Pattern Generators for Quadruped Locomotion (2211.00458v1)

Published 1 Nov 2022 in cs.RO, cs.AI, cs.LG, cs.SY, and eess.SY

Abstract: In this letter, we present a method for integrating central pattern generators (CPGs), i.e. systems of coupled oscillators, into the deep reinforcement learning (DRL) framework to produce robust and omnidirectional quadruped locomotion. The agent learns to directly modulate the intrinsic oscillator setpoints (amplitude and frequency) and coordinate rhythmic behavior among different oscillators. This approach also allows the use of DRL to explore questions related to neuroscience, namely the role of descending pathways, interoscillator couplings, and sensory feedback in gait generation. We train our policies in simulation and perform a sim-to-real transfer to the Unitree A1 quadruped, where we observe robust behavior to disturbances unseen during training, most notably to a dynamically added 13.75 kg load representing 115% of the nominal quadruped mass. We test several different observation spaces based on proprioceptive sensing and show that our framework is deployable with no domain randomization and very little feedback, where along with the oscillator states, it is possible to provide only contact booleans in the observation space. Video results can be found at https://youtu.be/xqXHLzLsEV4.

Citations (64)

View on Semantic Scholar

Summary

The paper presents CPG-RL, which combines CPGs and Deep Reinforcement Learning for quadruped locomotion by learning to dynamically modulate CPG parameters.
The CPG-RL policies demonstrated strong real-world transfer and robustness on uneven terrain and with significant added mass.
This research shows minimal sensory feedback is sufficient for locomotion, providing insights into biological motor control and suggesting reduced sensor needs for robots.

CPG-RL: Learning Central Pattern Generators for Quadruped Locomotion

The paper "CPG-RL: Learning Central Pattern Generators for Quadruped Locomotion" by Guillaume Bellegarda and Auke Ijspeert presents an innovative methodology combining Central Pattern Generators (CPGs) with Deep Reinforcement Learning (DRL) to achieve robust and versatile quadruped locomotion. CPGs, systems of coupled oscillators found in the spinal cords of vertebrates, are instrumental in generating rhythmic patterns necessary for locomotion. The integration of CPGs within a DRL framework aims to exploit the strength of both biologically-inspired mechanisms and machine-learning techniques to improve the agility and adaptability of robotic systems.

Methodology

This research introduces a learning framework wherein the agent modulates intrinsic oscillator parameters like amplitude, frequency, and phase orientation to coordinate limb movements. Unlike traditional approaches that rely on fixed CPG parameters, this framework allows dynamic adjustment to external stimuli and internal states, closely mimicking the flexibility seen in natural locomotion.

Action Space and Observation: The action space is defined to allow direct control over oscillator setpoints, eschewing explicit coupling to promote natural gait emergence through learning. Proprioceptive feedback, although minimized in one experimental scenario, provides essential information for limb coordination, demonstrating the power of minimal sensory input in gait generation.

Reward Function: The agent receives reinforcement signals which focus on velocity tracking in multiple directions while penalizing undesired body movements and energy expenditure. This design naturally guides the learned policies towards efficient and stable locomotion without requiring explicit motion descriptors in the reward shaping.

Results and Discussion

The CPG-RL framework successfully trained a quadruped robot to perform various locomotion tasks, including omnidirectional movement and robust terrain handling. The policies exhibit strong sim-to-real transferability, achieving real-world performance without domain randomization—a notable departure from conventional DRL approaches. Remarkably, the quadruped managed to cope with added mass up to 115% of its nominal weight and navigate uneven terrains, highlighting the method's robustness.

Interpretable Modulation: The modulation of CPG states reveals a nuanced control strategy, where amplitude adjustments during stance phases enhance propulsion, akin to biological systems. This emergent behavior underscores the biological plausibility of the method and suggests potential applications in understanding motor control in animals.

Minimal Sensory Requirements: The ability to function with minimal sensory feedback—restricted to CPG states and foot contact indicators—challenges the assumption of needing comprehensive proprioceptive data for coordinated movement. This minimal configuration not only economizes sensor use but also provides insights into the essential feedback mechanisms in locomotor systems.

Implications and Future Research

Practically, this approach heralds advancements in agile robotic locomotion across varied environments, potentially reducing the complexity and cost of sensor integration. Theoretically, the findings offer fertile ground for exploring neural control and learning mechanisms, giving clues to the roles of descending modulations and sensory feedback in gait regulation.

Future research may delve into the potential of adaptive control systems that develop more complex behaviors through unsupervised learning paradigms, or explore hybrid models incorporating vision-based sensory inputs for enhanced navigation capabilities. Further investigations into the interaction patterns within multilayered CPG networks could also lead to breakthroughs in understanding biological motor systems and their applications in robotics.

Related Papers

YouTube

Show All Videos