Parkour in the Wild: Learning a General and Extensible Agile Locomotion Policy Using Multi-expert Distillation and RL Fine-tuning (2505.11164v1)

Published 16 May 2025 in cs.RO

Abstract: Legged robots are well-suited for navigating terrains inaccessible to wheeled robots, making them ideal for applications in search and rescue or space exploration. However, current control methods often struggle to generalize across diverse, unstructured environments. This paper introduces a novel framework for agile locomotion of legged robots by combining multi-expert distillation with reinforcement learning (RL) fine-tuning to achieve robust generalization. Initially, terrain-specific expert policies are trained to develop specialized locomotion skills. These policies are then distilled into a unified foundation policy via the DAgger algorithm. The distilled policy is subsequently fine-tuned using RL on a broader terrain set, including real-world 3D scans. The framework allows further adaptation to new terrains through repeated fine-tuning. The proposed policy leverages depth images as exteroceptive inputs, enabling robust navigation across diverse, unstructured terrains. Experimental results demonstrate significant performance improvements over existing methods in synthesizing multi-terrain skills into a single controller. Deployment on the ANYmal D robot validates the policy's ability to navigate complex environments with agility and robustness, setting a new benchmark for legged robot locomotion.

Authors (4)

Nikita Rudin (13 papers)
Junzhe He (3 papers)
Joshua Aurand (8 papers)
Marco Hutter (165 papers)

Summary

Analysis of Agile Locomotion Policy for Legged Robots

The paper "Parkour in the wild: Learning a general and extensible agile locomotion policy using multi-expert distillation and RL Fine-tuning" presents a comprehensive framework aiming to address the challenges encountered in the field of legged robot maneuverability across varied and complex terrains. The research focuses on achieving a robust locomotion policy capable of generalization and adaptability, combining multi-expert distillation with reinforcement learning (RL) fine-tuning.

Legged robots are inherently suited for navigating terrains inaccessible to wheeled robots, making them potentially viable for applications in sectors such as search and rescue or extraterrestrial exploration. Yet, challenges persist in generalizing control methods across diverse environments. This paper introduces a framework where terrain-specific expert policies are initially trained to develop specialized locomotion skills. These skills are then distilled into a foundational policy using the DAgger algorithm. This stage embodies the concept of consolidating the learned behaviors into a unified model, which is subsequently refined using RL across broader terrain types, including real-world 3D scans for improved versatility.

A significant contribution of this work is its focus on leveraging depth images as exteroceptive inputs, which supports the robot's navigation capabilities across a broad spectrum of unstructured terrains. The experimental evaluation highlights marked improvements in performance, effective synthesis of multi-terrain skills, and enhanced agility and robustness when deployed on the ANYmal D robot. This sets a new standard for legged robot locomotion.

Numerical Results and Claims

The results demonstrate a substantial increase in success rates across different terrains, showcasing the effectiveness of the distillation and fine-tuning stages. This iterative fine-tuning, using RL, leads to an increase in the policy's ability to tackle complex terrains beyond those utilized for expert training. The capability to learn new behaviors, such as adaptive motion for improved obstacle visibility, exemplifies the policy's robustness. Additionally, the deployment results in real-world environments validate these findings, demonstrating a high level of agility and resilience against various disturbances.

Implications and Future Work

The implications of this research are twofold: practical and theoretical. Practically, the ability for legged robots to operate efficiently in unstructured environments enhances their applicability in real-world scenarios where adaptability is crucial, such as disaster response or multi-terrain exploration missions. This could potentially lead to faster and more versatile robots capable of overcoming obstacles without prior mapping or state estimation.

Theoretically, the framework laid out in this paper could serve as a foundation for further advancements in RL and distillation techniques, aiming for improved transferability across different robot embodiments and environments. The research opens pathways to explore more sophisticated noise models or innovative neural architectures, such as attention-based models that handle long-term dependencies more effectively.

Further research could also delve into optimizing the balance between leveraged expert knowledge and emergent behaviors, so that improvements in both exploratory and exploitation phases of RL can be robustly achieved. Systematic evaluation of perception strategies and their integration with dynamic control mechanisms could yield additional insights into enhancing robot autonomy in significantly challenging environments. These advancements would contribute considerably to the growing landscape of robotics and AI, driving forward innovations in intelligent system design.

Related Papers

Find Related Papers

Tweets

https://twitter.com/RoboReading/status/1925577254626594940

https://twitter.com/leggedrobotics/status/1927770103535280252

https://twitter.com/philfung/status/1930398848347976048

YouTube

Show All Videos