Integrating Model-Based Footstep Planning with Model-Free Reinforcement Learning for Dynamic Legged Locomotion (2408.02662v1)

Published 5 Aug 2024 in cs.RO, cs.SY, and eess.SY

Abstract: In this work, we introduce a control framework that combines model-based footstep planning with Reinforcement Learning (RL), leveraging desired footstep patterns derived from the Linear Inverted Pendulum (LIP) dynamics. Utilizing the LIP model, our method forward predicts robot states and determines the desired foot placement given the velocity commands. We then train an RL policy to track the foot placements without following the full reference motions derived from the LIP model. This partial guidance from the physics model allows the RL policy to integrate the predictive capabilities of the physics-informed dynamics and the adaptability characteristics of the RL controller without overfitting the policy to the template model. Our approach is validated on the MIT Humanoid, demonstrating that our policy can achieve stable yet dynamic locomotion for walking and turning. We further validate the adaptability and generalizability of our policy by extending the locomotion task to unseen, uneven terrain. During the hardware deployment, we have achieved forward walking speeds of up to 1.5 m/s on a treadmill and have successfully performed dynamic locomotion maneuvers such as 90-degree and 180-degree turns.

Citations (1)

View on Semantic Scholar

Summary

The paper introduces a hybrid control framework that combines LIPM-based footstep planning with model-free reinforcement learning to enhance dynamic and stable legged locomotion.
It employs a hierarchical structure where predictive models guide foot placements and an RL policy refines actions, achieving precise 0.35-second step timing and superior adaptability on varied terrains.
Hardware experiments on the MIT Humanoid robot validate the approach with 1.5 m/s forward speeds and dynamic turning maneuvers, demonstrating its practical effectiveness.

Integrating Model-Based Footstep Planning with Model-Free Reinforcement Learning for Dynamic Legged Locomotion

This paper presents a novel control framework that synthesizes model-based footstep planning techniques with model-free Reinforcement Learning (RL) to achieve dynamic and stable legged locomotion. The significance of this integration lies in leveraging the predictive accuracy of physics-based models while maintaining the adaptability and robustness of RL controllers.

Overview

The paper introduces a methodology centered around the Linear Inverted Pendulum Model (LIPM) dynamics for generating footstep patterns. These desired patterns serve as partial guidance for the RL policy, allowing for dynamic prediction and adaptation during locomotion tasks. This method is validated through extensive simulations and hardware experiments conducted on the MIT Humanoid robot.

The primary contribution of this work is the development of an RL policy that can track foot placements determined by the LIP model without strictly following the full reference motions. This approach mitigates potential overfitting to the model and enhances the exploration capabilities during RL training.

Methodology

The proposed control framework employs a hierarchical structure:

Footstep Pattern Generation Using LIPM: The framework uses the LIP model to forward predict the robot's state and generate target step locations based on given velocity commands.
RL Policy Training: An RL policy is trained to follow these desired foot placements. The policy optimally balances the trade-off between following model-based references and exploring alternative actions that maximize locomotion stability and performance.

The RL policy's state space includes the robot's dynamic states, step commands, and velocity commands. The action space focuses on residual joint PD setpoints, which are updated at a high frequency to ensure precise control.

Validation and Results

Simulation Analysis

The simulation results provided comprehensive insights into the efficacy of the proposed approach:

Velocity Tracking Performance: The proposed method exhibited superior tracking performance compared to an End-to-End RL policy trained on varied terrains, and comparable results to one trained exclusively on flat terrain.
Step Duration Learning: The policy accurately learned the desired step duration of 0.35 seconds, resulting in precise foot placement synchronization.
Robust Foot Placement Tracking: Both left and right foot trajectories were smooth and accurately followed the desired step locations, validating the effectiveness of the LIPM-based step pattern generation.
Adaptability to Unseen Terrains: The policy demonstrated robust adaptability to rough and gap terrains by dynamically adjusting the desired step locations, with a higher success rate in maintaining forward velocity as compared to baseline policies.

Hardware Deployment

The paper's hardware experiments on the MIT Humanoid robot confirmed the simulation findings:

Forward Walking: The robot achieved forward walking speeds up to 1.5 m/s and showcased human-like heel-to-toe motion.
Dynamic Turning: The robot successfully performed dynamic maneuvers, including 90-degree and 180-degree turns, demonstrating the RL policy's capacity to handle real-world complexities.

Implications and Future Work

The implications of integrating model-based and model-free approaches are significant. The proposed framework offers a versatile and scalable solution for dynamic legged locomotion, balancing the advantages of predictive modeling and the adaptive strengths of RL. This integration can potentially be extended to various robotic platforms and environments, enhancing the robustness and generalizability of legged robots.

Future work should aim at incorporating vision-based algorithms for terrain detection, which would enable real-time adjustments to foot placement based on the height and stability of the terrain. Additionally, exploring model predictive controllers that incorporate whole-body dynamics could further refine step location predictions, improving the overall locomotion performance.

Conclusion

The paper successfully demonstrates that combining model-based footstep planning with model-free RL can yield robust, adaptable, and stable locomotion in legged robots. By leveraging the strengths of both approaches, this framework presents a significant advancement in the control of dynamic legged locomotion, with promising avenues for future research and application in more complex and varied environments.

PDF Markdown

Related Papers

Tweets

https://twitter.com/ChongZitaZhang/status/1821326092415717485

https://twitter.com/fly51fly/status/1822754287568683301

https://twitter.com/OWW/status/1820938577947025605

YouTube

Show All Videos