Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response (2312.11460v3)

Published 18 Dec 2023 in cs.RO, cs.AI, cs.CV, cs.LG, cs.SY, and eess.SY

Abstract: Robust locomotion control depends on accurate state estimations. However, the sensors of most legged robots can only provide partial and noisy observations, making the estimation particularly challenging, especially for external states like terrain frictions and elevation maps. Inspired by the classical Internal Model Control principle, we consider these external states as disturbances and introduce Hybrid Internal Model (HIM) to estimate them according to the response of the robot. The response, which we refer to as the hybrid internal embedding, contains the robot's explicit velocity and implicit stability representation, corresponding to two primary goals for locomotion tasks: explicitly tracking velocity and implicitly maintaining stability. We use contrastive learning to optimize the embedding to be close to the robot's successor state, in which the response is naturally embedded. HIM has several appealing benefits: It only needs the robot's proprioceptions, i.e., those from joint encoders and IMU as observations. It innovatively maintains consistent observations between simulation reference and reality that avoids information loss in mimicking learning. It exploits batch-level information that is more robust to noises and keeps better sample efficiency. It only requires 1 hour of training on an RTX 4090 to enable a quadruped robot to traverse any terrain under any disturbances. A wealth of real-world experiments demonstrates its agility, even in high-difficulty tasks and cases never occurred during the training process, revealing remarkable open-world generalizability.

References (43)

Citations (14)

View on Semantic Scholar

Collections

Summary

The paper introduces the Hybrid Internal Model (HIM) that reduces reliance on external sensing by leveraging proprioceptive inputs for agile locomotion.
It employs a contrastive learning technique within the Hybrid Internal Optimization module to accurately predict state transitions and system disturbances.
The framework, trained with PPO, achieves robust performance on challenging terrains with 200 million samples, far fewer than baseline methods.

Insights into the Hybrid Internal Model for Agile Legged Locomotion

The paper "Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response" presents a method addressing the control challenges in robotic legged locomotion. This research proposes the Hybrid Internal Model (HIM), an innovative approach that enables quadruped robots to navigate various terrains efficiently utilizing a learning-based framework rooted in internal model control principles. By focusing on proprioceptive inputs and exploiting batch-level information, HIM circumvents the limitations associated with external state access and sim-to-real transfer.

Key Contributions

The primary contribution of this paper is the introduction of the HIM framework, which utilizes proprioceptive data from sensor modalities such as joint encoders and an Inertial Measurement Unit (IMU), eliminating the need for external state sensing like environmental maps and elevation. The approach leverages contrastive learning within its Hybrid Internal Optimization (HIO) module, facilitating robust estimation of a robot's successor state while being adept at implicitly inferring system disturbances. This stands in contrast to existing methodologies that depend heavily on mimicking behaviors from simulated environments, notably seen in frameworks like Rapid Motor Adaptation (RMA).

Key high-level aspects of HIM include:

Hybrid Internal Embedding: The framework uses a two-pronged internal representation strategy—explicitly estimating velocity while implicitly focusing on stability. This hybrid approach helps maintain consistent observations between simulation and reality, allowing robust learning that is less dependent on the external environment's explicit parameters.
Proximal Policy Optimization Interaction: HIM is trained using the Proximal Policy Optimization (PPO), where hybrid internal embeddings are optimized in each learning iteration, enhancing sample efficiency and noise robustness. Notably, the training process is resource-efficient, requiring only one hour on an RTX 4090 GPU to achieve the desired locomotive capabilities.

Numerical and Experimental Findings

The empirical evaluation of HIM demonstrates its capabilities across a range of scenes, including high-difficulty tasks such as ascending long staircases and handling environmental disturbances without specific prior exposures. The research shows a significant optimization capability in the locomotion policies that only necessitate 200 million samples compared to baselines that require upwards of 1,280 million samples. Furthermore, HIM outperformed traditional learning-based controllers, such as the multiplicity of behavior (MoB) and pure regression-based approaches, in both simulation benchmarks and real-world tests.

In real-world experiments, the HIM-driven policy exhibited impressive success rates in traversing compositional terrains and deformable slopes, revealing generalizable agility beyond the training domain. This demonstrates the operational viability and adaptability of HIM across different robotic platforms, including Unitree Aliengo, A1, and Go1 robots.

Implications and Future Perspectives

The HIM framework posits several theoretical and practical implications. Theoretically, this paper advances the understanding of how internal model principles can be integrated into modern machine learning paradigms to manage complex system dynamics with limited sensor information. Practically, it propels the development of versatile robotic systems capable of agile maneuvers without extensive environment-specific tuning.

Future research may encompass expanding this framework to include multi-modal sensor inputs, enhancing robustness further in more varied and unpredictable environmental conditions. Integration with external sensors, such as cameras and lidar, could extend the functional range of HIM, addressing more complex tasks that require detailed environmental awareness. Additionally, exploring parallel and distributed computational techniques could scale the learning efficiency for broader application scenarios and robotic models.

In conclusion, the Hybrid Internal Model stands as a strategic intersection between classical control theory and contemporary reinforcement learning, providing a streamlined yet potent approach to agile quadrupedal locomotion. This work not only contributes significantly to the field of autonomous robotics but also opens pathways for exploring innovative integrations between internal model concepts and artificial intelligence methodologies in other dynamic, real-world applications.