Learning Agile Locomotion on Risky Terrains (2311.10484v2)

Published 17 Nov 2023 in cs.RO

Abstract: Quadruped robots have shown remarkable mobility on various terrains through reinforcement learning. Yet, in the presence of sparse footholds and risky terrains such as stepping stones and balance beams, which require precise foot placement to avoid falls, model-based approaches are often used. In this paper, we show that end-to-end reinforcement learning can also enable the robot to traverse risky terrains with dynamic motions. To this end, our approach involves training a generalist policy for agile locomotion on disorderly and sparse stepping stones before transferring its reusable knowledge to various more challenging terrains by finetuning specialist policies from it. Given that the robot needs to rapidly adapt its velocity on these terrains, we formulate the task as a navigation task instead of the commonly used velocity tracking which constrains the robot's behavior and propose an exploration strategy to overcome sparse rewards and achieve high robustness. We validate our proposed method through simulation and real-world experiments on an ANYmal-D robot achieving peak forward velocity of >= 2.5 m/s on sparse stepping stones and narrow balance beams. Video: youtu.be/Z5X0J8OH6z4

PDF Abstract

Learning Agile Locomotion on Risky Terrains: An Expert Overview

The paper "Learning Agile Locomotion on Risky Terrains" by Zhang et al. presents a novel approach for training quadrupedal robots to navigate highly challenging terrains through end-to-end reinforcement learning (RL). While prior research has predominantly relied on model-based methods for such tasks, the authors propose that an RL-based framework, coupled with intelligent policy training, can yield comparable if not superior locomotion agility in environments previously deemed exceptionally risky due to sparse footholds and the necessity for precise foot placement.

There are several key aspects of this work that demand attention. First, the authors employ a strategy that involves the training of a generalist policy intended to navigate disorderly stepping stones with agility. This generalist policy enables the transfer of acquired sensorimotor skills to specialized policies, which are then fine-tuned on more intricate terrains. Such an approach not only demonstrates computational efficiency by leveraging pre-trained models but also ensures versatility across distinct terrains.

One of the innovative contributions of this work lies in formulating the locomotion problem as a navigation task instead of using traditional velocity tracking formulations. This re-framing plays a pivotal role in facilitating the robot's adaptive behaviors, allowing it to rapidly modulate its speed based on the terrain. The authors enhance exploration efficiency and policy robustness through a carefully designed exploration strategy that incorporates improved curriculum learning, intrinsic curiosity-based rewards, and symmetry-based data augmentation, leveraging the inherent symmetry in quadruped design.

The results of this approach are substantial. The authors validate their methodology using the ANYmal-D robot, showcasing remarkable locomotion capabilities in both simulated and real-world environments. The robot accomplishes peak forward velocities ≥ 2.5 m/s on stepping stones and narrow beams, an achievement that underscores the potential of RL frameworks in creating robust and agile robotic locomotion systems. Additionally, experimental setup results highlight the successful sim-to-real transfer of the policies, which is a decisive factor for the practical deployment of such systems.

The implications of this research extend beyond immediate applications in robotic navigation on complex terrains. By demonstrating the success of an RL-based methodology combined with strategic model training and data efficiency techniques, this paper contributes to shifting perspectives about the potential of model-free approaches in robotics, particularly in scenarios where model-based controllers have traditionally been the norm.

In terms of future developments, the authors speculate on integrating onboard perceptive systems to automate terrain mapping, which would further enhance the autonomy of such robotic systems. They also recognize the potential for expanding the training framework to accommodate a broader range of terrains, ultimately aiming for the development of a unified policy that excels without terrain-specific fine-tuning. Moreover, they touch on the continuing challenge of improving the interpretability and robustness of neural network policies to better understand and mitigate unforeseen failure modes—a crucial step for wider adoption in safety-critical applications.

Overall, this paper represents an important step in advancing the capabilities of quadrupedal robots to navigate challenging real-world terrains using RL, setting a foundation for future research in achieving both agility and robustness through intelligent learning paradigms.