Learning H-Infinity Locomotion Control (2404.14405v2)

Published 22 Apr 2024 in cs.RO

Abstract: Stable locomotion in precipitous environments is an essential task for quadruped robots, requiring the ability to resist various external disturbances. Recent neural policies enhance robustness against disturbances by learning to resist external forces sampled from a fixed distribution in the simulated environment. However, the force generation process doesn't consider the robot's current state, making it difficult to identify the most effective direction and magnitude that can push the robot to the most unstable but recoverable state. Thus, challenging cases in the buffer are insufficient to optimize robustness. In this paper, we propose to model the robust locomotion learning process as an adversarial interaction between the locomotion policy and a learnable disturbance that is conditioned on the robot state to generate appropriate external forces. To make the joint optimization stable, our novel $H_{\infty}$ constraint mandates the bound of the ratio between the cost and the intensity of the external forces. We verify the robustness of our approach in both simulated environments and real-world deployment, on quadrupedal locomotion tasks and a more challenging task where the quadruped performs locomotion merely on hind legs. Training and deployment code will be made public.

Authors (6)

Junfeng Long (8 papers)
Wenye Yu (4 papers)
Quanyi Li (19 papers)
Zirui Wang (83 papers)
Dahua Lin (336 papers)
Jiangmiao Pang (77 papers)

Citations (2)

View on Semantic Scholar

Summary

The paper introduces a dual actor-disturber framework that optimizes quadruped locomotion using H-Infinity constraints to enhance robustness.
It leverages Proximal Policy Optimization (PPO) within advanced simulations and real-world tests to achieve improved stability and performance.
The adaptive disturbance calibration under H-Infinity bounds offers a reliable method for navigating complex and unpredictable terrains.

Learning H-Infinity Locomotion Control for Quadruped Robots

Introduction

Robotics has been rapidly advancing, particularly in the field of learning-based methods for controlling quadruped locomotion. These enhancements are primarily driven by leveraging large-scale parallel training environments supported by neural controllers allowing for robust movement over complex terrains. Yet, ensuring resilience against unforeseen disturbances remains a critical challenge, vital for real-world applications like disaster recovery or unstructured terrain navigation. Traditional solutions often utilize basic domain randomization techniques during training, which does not fully equip robots for variable real-world disturbances.

This paper presents an innovative approach by modeling the learning process under adversarial dynamics focusing on enhancing robustness through H-Infinity control. The unique aspect of this approach lies in its dual optimization strategy involving an "actor" to perform the task and a "disturber" to challenge task execution under controlled, escalating disturbance scenarios. This dynamic is further stabilized by the integration of $H_{\infty}$ constraints to maintain a balance between performance degradation and disturbance intensity.

Core Methodology

The system architecture pivots on a sophisticated interaction between the actor, trained to maximize overall rewards, and the disturber, optimized to maximize errors between expected task rewards and achieved outcomes. The learning mechanism adopts Proximal Policy Optimization (PPO) to handle this adversarial setup and is defined over a robust simulation framework using Isaac Gym.

H-Infinity Constraint Implementation

The $H_{\infty}$ constraint is a paramount part of this framework, ensuring a bounded ratio between the cost inflicted by the disturber and the intensity of the external forces applied. This methodological backbone not only enhances the disturbance handling capability of the learning model but also guarantees theoretical robustness bounds, underpinning stability across learning iterations.

Simulation and Real-World Testing Environments

Multiple test scenarios are laid out, ranging from continuous to sudden and high-intensity disruptions in simulated environments, further extending to complex physical terrains such as slippery slopes or uneven surfaces. The robot's performance showcased significant improvement in stability and adaptiveness over traditional methods, particularly under highly disruptive conditions meant to emulate real-world operational challenges.

Results and Observations

Quantitative advancements are noted across various testing scenarios, demonstrating superior task performance and disturbance handling by robots trained under the proposed $H_{\infty}$ paradigm. Specifically, the use of an adaptive disturber under $H_{\infty}$ constraints allowed for a nuanced calibration of disturbances that align with the robot's current state and learning progress, ultimately contributing to a more robust locomotion control.

In real-world tests, the deployment on robust Unitree models confirmed the practical applicability of the proposed approach, where the controlled policy effectively handled real-world disturbances across diverse terrains, showcasing both resilience and agility.

Conclusion and Future Work

The integration of H-Infinity control in the training of neural network-based controllers for robotic locomotion presents a significant step toward robust autonomous operation in dynamic and unpredictable environments. The success of these methods in simulation and real-world trials provides a promising outlook for future applications in various industrial and rescue operations. Further research might explore the extension of these principles to other robotic configurations, such as bipedal or aerial robots, potentially transforming broader areas of robotics where adaptability and resilience are critical.

This approach invites a deeper exploration into adaptive and resilient machine learning techniques that are not just theoretical in their robustness but proven under the physically demanding conditions that mimic the real world.

PDF Markdown

Related Papers

Tweets

https://twitter.com/_akhaliq/status/1782610787598508394

https://twitter.com/pangjiangmiao/status/1782615823292456974

https://twitter.com/Jason_ywy/status/1848750721593971105

https://twitter.com/IAmACatAI/status/1782680815606157488

https://twitter.com/OWW/status/1801403213670682836