Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 99 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 40 tok/s
GPT-5 High 38 tok/s Pro
GPT-4o 101 tok/s
GPT OSS 120B 470 tok/s Pro
Kimi K2 161 tok/s Pro
2000 character limit reached

Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response (2312.11460v3)

Published 18 Dec 2023 in cs.RO, cs.AI, cs.CV, cs.LG, cs.SY, and eess.SY

Abstract: Robust locomotion control depends on accurate state estimations. However, the sensors of most legged robots can only provide partial and noisy observations, making the estimation particularly challenging, especially for external states like terrain frictions and elevation maps. Inspired by the classical Internal Model Control principle, we consider these external states as disturbances and introduce Hybrid Internal Model (HIM) to estimate them according to the response of the robot. The response, which we refer to as the hybrid internal embedding, contains the robot's explicit velocity and implicit stability representation, corresponding to two primary goals for locomotion tasks: explicitly tracking velocity and implicitly maintaining stability. We use contrastive learning to optimize the embedding to be close to the robot's successor state, in which the response is naturally embedded. HIM has several appealing benefits: It only needs the robot's proprioceptions, i.e., those from joint encoders and IMU as observations. It innovatively maintains consistent observations between simulation reference and reality that avoids information loss in mimicking learning. It exploits batch-level information that is more robust to noises and keeps better sample efficiency. It only requires 1 hour of training on an RTX 4090 to enable a quadruped robot to traverse any terrain under any disturbances. A wealth of real-world experiments demonstrates its agility, even in high-difficulty tasks and cases never occurred during the training process, revealing remarkable open-world generalizability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Legged locomotion in challenging terrains using egocentric vision. In Conference on Robot Learning (CoRL), 2023.
  2. Vision-aided dynamic quadrupedal locomotion on discrete terrain using motion libraries. In International Conference on Robotics and Automation (ICRA), 2022.
  3. Mit cheetah 3: Design and control of a robust, dynamic quadruped robot. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018.
  4. The mit super mini cheetah: A small, low-cost quadrupedal robot for dynamic locomotion. In 2015 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), 2015.
  5. Barkour: Benchmarking animal-level agility with quadruped robots. arXiv preprint arXiv:2305.14654, 2023.
  6. Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 2020.
  7. Learning by cheating. In Conference on Robot Learning (CoRL), 2019.
  8. Legs as manipulator: Pushing quadrupedal agility beyond locomotion. In IEEE International Conference on Robotics and Automation (ICRA), 2023.
  9. Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 2013.
  10. Dreamerpro: Reconstruction-free model-based reinforcement learning with prototypical representations. In International Conference on Machine Learning, 2022.
  11. Adversarial motion priors make good substitutes for complex reward functions. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022.
  12. Minimizing energy consumption leads to the emergence of gaits in legged robots. In Conference on Robot Learning (CoRL), 2021.
  13. Deep whole-body control: Learning a unified policy for manipulation and locomotion. In Conference on Robot Learning (CoRL), 2022.
  14. Legged robot state-estimation through combined forward kinematic and preintegrated contact factors. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp.  4422–4429. IEEE, 2018.
  15. Anymal-a highly mobile and dynamic quadrupedal robot. In IEEE/RSJ international conference on intelligent robots and systems (IROS), 2016.
  16. Learning agile and dynamic motor skills for legged robots. Science Robotics, 2019.
  17. Dribblebot: Dynamic legged manipulation in the wild. In IEEE International Conference on Robotics and Automation (ICRA), 2023.
  18. Auto-encoding variational bayes. In The Second International Conference on Learning Representations, 2014.
  19. Rma: Rapid motor adaptation for legged robots. In Robotics: Science and Systems, 2021.
  20. Yann LeCun. A path towards autonomous machine intelligence. Open Review, 2022.
  21. Learning quadrupedal locomotion over challenging terrain. Science robotics, 2020.
  22. Isaac gym: High performance gpu-based physics simulation for robot learning. Advances in neural information processing systems, 2021.
  23. Walk these ways: Tuning robot control for generalization with multiplicity of behavior. In Conference on Robot Learning (CoRL), 2023.
  24. Rapid locomotion via reinforcement learning. In Robotics: Science and Systems, 2022.
  25. Learning robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics, 2022a.
  26. Elevation mapping for locomotion and navigation using gpu. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022b.
  27. Dreamwaq: Learning robust quadrupedal locomotion with implicit terrain imagination via deep reinforcement learning. In IEEE International Conference on Robotics and Automation (ICRA), 2023.
  28. Learning to fly—a gym environment with pybullet physics for reinforcement learning of multi-agent quadcopter control. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021.
  29. Learning agile robotic locomotion skills by imitating animals. In Robotics: Science and Systems, 2020.
  30. Internal model control: Pid controller design. Industrial & engineering chemistry process design and development, 1986.
  31. Learning to walk in minutes using massively parallel deep reinforcement learning. In Conference on Robot Learning (CoRL), 2022.
  32. A compliant hybrid zero dynamics controller for stable, efficient and fast bipedal walking on mabel. The International Journal of Robotics Research, 2011.
  33. Sim-to-real: Learning agile locomotion for quadruped robots. In Robotics: Science and Systems, 2018.
  34. Mujoco: A physics engine for model-based control. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2012.
  35. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 2008.
  36. Learning robust and agile legged locomotion using adversarial motion priors. IEEE Robotics and Automation Letters, 2022.
  37. Daydreamer: World models for physical robot learning. In Conference on Robot Learning (CoRL), 2023.
  38. Physics-based modeling and simulation of human walking: a review of optimization-based and other approaches. Structural and multidisciplinary optimization, 2010.
  39. Dynamics randomization revisited: A case study for quadrupedal locomotion. In IEEE International Conference on Robotics and Automation (ICRA), 2021.
  40. Reinforcement learning with prototypical representations. In International Conference on Machine Learning, 2021.
  41. Simbicon: Simple biped locomotion control. ACM Transactions on Graphics (TOG), 2007.
  42. Visual-locomotion: Learning to walk on complex terrains with vision. In Conference on Robot Learning (CoRL), 2021.
  43. Robot parkour learning. In Conference on Robot Learning (CoRL), 2023.
Citations (14)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces the Hybrid Internal Model (HIM) that reduces reliance on external sensing by leveraging proprioceptive inputs for agile locomotion.
  • It employs a contrastive learning technique within the Hybrid Internal Optimization module to accurately predict state transitions and system disturbances.
  • The framework, trained with PPO, achieves robust performance on challenging terrains with 200 million samples, far fewer than baseline methods.

Insights into the Hybrid Internal Model for Agile Legged Locomotion

The paper "Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response" presents a method addressing the control challenges in robotic legged locomotion. This research proposes the Hybrid Internal Model (HIM), an innovative approach that enables quadruped robots to navigate various terrains efficiently utilizing a learning-based framework rooted in internal model control principles. By focusing on proprioceptive inputs and exploiting batch-level information, HIM circumvents the limitations associated with external state access and sim-to-real transfer.

Key Contributions

The primary contribution of this paper is the introduction of the HIM framework, which utilizes proprioceptive data from sensor modalities such as joint encoders and an Inertial Measurement Unit (IMU), eliminating the need for external state sensing like environmental maps and elevation. The approach leverages contrastive learning within its Hybrid Internal Optimization (HIO) module, facilitating robust estimation of a robot's successor state while being adept at implicitly inferring system disturbances. This stands in contrast to existing methodologies that depend heavily on mimicking behaviors from simulated environments, notably seen in frameworks like Rapid Motor Adaptation (RMA).

Key high-level aspects of HIM include:

  • Hybrid Internal Embedding: The framework uses a two-pronged internal representation strategy—explicitly estimating velocity while implicitly focusing on stability. This hybrid approach helps maintain consistent observations between simulation and reality, allowing robust learning that is less dependent on the external environment's explicit parameters.
  • Proximal Policy Optimization Interaction: HIM is trained using the Proximal Policy Optimization (PPO), where hybrid internal embeddings are optimized in each learning iteration, enhancing sample efficiency and noise robustness. Notably, the training process is resource-efficient, requiring only one hour on an RTX 4090 GPU to achieve the desired locomotive capabilities.

Numerical and Experimental Findings

The empirical evaluation of HIM demonstrates its capabilities across a range of scenes, including high-difficulty tasks such as ascending long staircases and handling environmental disturbances without specific prior exposures. The research shows a significant optimization capability in the locomotion policies that only necessitate 200 million samples compared to baselines that require upwards of 1,280 million samples. Furthermore, HIM outperformed traditional learning-based controllers, such as the multiplicity of behavior (MoB) and pure regression-based approaches, in both simulation benchmarks and real-world tests.

In real-world experiments, the HIM-driven policy exhibited impressive success rates in traversing compositional terrains and deformable slopes, revealing generalizable agility beyond the training domain. This demonstrates the operational viability and adaptability of HIM across different robotic platforms, including Unitree Aliengo, A1, and Go1 robots.

Implications and Future Perspectives

The HIM framework posits several theoretical and practical implications. Theoretically, this paper advances the understanding of how internal model principles can be integrated into modern machine learning paradigms to manage complex system dynamics with limited sensor information. Practically, it propels the development of versatile robotic systems capable of agile maneuvers without extensive environment-specific tuning.

Future research may encompass expanding this framework to include multi-modal sensor inputs, enhancing robustness further in more varied and unpredictable environmental conditions. Integration with external sensors, such as cameras and lidar, could extend the functional range of HIM, addressing more complex tasks that require detailed environmental awareness. Additionally, exploring parallel and distributed computational techniques could scale the learning efficiency for broader application scenarios and robotic models.

In conclusion, the Hybrid Internal Model stands as a strategic intersection between classical control theory and contemporary reinforcement learning, providing a streamlined yet potent approach to agile quadrupedal locomotion. This work not only contributes significantly to the field of autonomous robotics but also opens pathways for exploring innovative integrations between internal model concepts and artificial intelligence methodologies in other dynamic, real-world applications.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com