Learning Quadrupedal Locomotion over Challenging Terrain (2010.11251v1)

Published 21 Oct 2020 in cs.RO, cs.LG, cs.SY, and eess.SY

Abstract: Some of the most challenging environments on our planet are accessible to quadrupedal animals but remain out of reach for autonomous machines. Legged locomotion can dramatically expand the operational domains of robotics. However, conventional controllers for legged locomotion are based on elaborate state machines that explicitly trigger the execution of motion primitives and reflexes. These designs have escalated in complexity while falling short of the generality and robustness of animal locomotion. Here we present a radically robust controller for legged locomotion in challenging natural environments. We present a novel solution to incorporating proprioceptive feedback in locomotion control and demonstrate remarkable zero-shot generalization from simulation to natural environments. The controller is trained by reinforcement learning in simulation. It is based on a neural network that acts on a stream of proprioceptive signals. The trained controller has taken two generations of quadrupedal ANYmal robots to a variety of natural environments that are beyond the reach of prior published work in legged locomotion. The controller retains its robustness under conditions that have never been encountered during training: deformable terrain such as mud and snow, dynamic footholds such as rubble, and overground impediments such as thick vegetation and gushing water. The presented work opens new frontiers for robotics and indicates that radical robustness in natural environments can be achieved by training in much simpler domains.

PDF Abstract

An Expert Overview of "Learning Quadrupedal Locomotion over Challenging Terrain"

The paper, authored by Joonho Lee et al., addresses the challenges inherent in achieving robust quadrupedal locomotion over varied and challenging terrains utilizing reinforcement learning (RL). Exploring beyond traditional control systems reliant on elaborate state machines and explicit motion primitives, the researchers present a controller capable of zero-shot generalization from simulation to diverse real-world environments.

Methodology

The key components of the methodology are:

Proprioceptive Feedback: The controller relies exclusively on proprioceptive sensors, specifically joint encoders and an inertial measurement unit (IMU), eschewing exteroceptive sensors like cameras and LiDAR. This approach ensures durability and reliability, crucial for achieving robustness on terrains where visual sensors may fail.
Reinforcement Learning and Simulation: The controller is trained in a simulated environment using model-free RL. The simulation incorporates varied but rigid terrain profiles, teaching the controller to handle a range of conditions that were not explicitly modeled in training, such as deformable terrain and dynamic footholds.
Policy Architecture and Training Protocol: Instead of conventional multi-layer perceptron (MLP) models, the researchers employ a Temporal Convolutional Network (TCN), which processes an extended history of proprioceptive states. The training process is structured into two stages:
- A teacher policy is trained using privileged information (e.g., ground truth terrain data) in the simulation.
- The teacher then supervises the training of a student policy that relies solely on proprioceptive data.
Adaptive Terrain Curriculum: Leveraging an adaptive curriculum, terrains are synthesized to match the learner's evolving capabilities. Particle filtering is used to maintain and adapt a distribution of terrain parameters that progressively challenge the controller.

Results

The controller was tested across various real-world terrains, demonstrating zero-shot generalization with robust locomotion capabilities:

Natural Environment Trials: The controller successfully navigated a multitude of environments, including snow-covered slopes, rubble, thick vegetation, and running water (Fig. 1 and 2). Notably, it handled conditions not encountered during training, such as mud and snow, illustrating exceptional transferability from simulation to reality.
DARPA Subterranean Challenge: The controller was utilized by the CERBERUS team in the DARPA Subterranean Challenge, navigating complex underground settings including steep staircases without any tuning or failure across multiple missions (Fig. 2G).
Comparative Evaluation: When compared to a state-of-the-art baseline, the controller excelled in terms of locomotion speed and energy efficiency. It achieved higher average speeds and lower mechanical Costs of Transport (COT) across various terrains (Table 1). Moreover, the baseline faced catastrophic failures in challenging conditions, whereas the presented controller showed no such failures during testing.

Implications

This research has significant implications for the development of autonomous legged robots. The implications include:

Practical Applications: The ability to autonomously traverse unpredictable terrains without extensive pre-programming or exteroceptive sensing opens up new possibilities for deploying legged robots in disaster response, exploration, and other field applications where robots encounter entirely unknown environments.
Theoretical Advances: From a theoretical standpoint, the research underlines the efficacy of RL in developing robust locomotion skills that outperform traditional control architectures. The use of proprioception and adaptive curricula could influence future methodologies in robot learning and control systems development.

Future Directions

While the work demonstrates robust locomotion capabilities, several directions for future research are apparent:

Gait Diversity: Investigating training protocols that can elicit a broader range of gait patterns from the quadrupedal robots, beyond the trot gait demonstrated, could further enhance versatility.
Hybrid Perception Models: Integrating exteroceptive data with the proprioceptive controller could enable higher-level navigation tasks, enhancing speed and efficiency in benign environments while ensuring fail-safes in hazardous conditions.
Advanced Simulation Techniques: Enhancing simulation frameworks to better model phenomena such as mud, snow, and vegetation can potentially yield even more robust controllers, extending the capabilities demonstrated in this work.

In conclusion, this paper presents a significant advancement in the robustness and generalizability of quadrupedal locomotion through reinforcement learning. The presented methodologies and results pave the way for future developments in autonomous legged robots capable of navigating complex and challenging environments autonomously.