RL + Model-based Control: Using On-demand Optimal Control to Learn Versatile Legged Locomotion (2305.17842v4)

Published 29 May 2023 in cs.RO, cs.AI, cs.SY, and eess.SY

Abstract: This paper presents a control framework that combines model-based optimal control and reinforcement learning (RL) to achieve versatile and robust legged locomotion. Our approach enhances the RL training process by incorporating on-demand reference motions generated through finite-horizon optimal control, covering a broad range of velocities and gaits. These reference motions serve as targets for the RL policy to imitate, leading to the development of robust control policies that can be learned with reliability. Furthermore, by utilizing realistic simulation data that captures whole-body dynamics, RL effectively overcomes the inherent limitations in reference motions imposed by modeling simplifications. We validate the robustness and controllability of the RL training process within our framework through a series of experiments. In these experiments, our method showcases its capability to generalize reference motions and effectively handle more complex locomotion tasks that may pose challenges for the simplified model, thanks to RL's flexibility. Additionally, our framework effortlessly supports the training of control policies for robots with diverse dimensions, eliminating the necessity for robot-specific adjustments in the reward function and hyperparameters.

Citations (26)

View on Semantic Scholar

Summary

The paper introduces a novel framework that integrates model-based optimal control with reinforcement learning to generate robust and dynamic reference motions for varied gaits.
It demonstrates that finite-horizon optimal control signals can effectively train RL policies to generalize across uneven terrains and multiple quadruped platforms without complex reward tuning.
The framework significantly reduces fall rates and accelerates training while enabling reliable sim-to-real transfer for diverse locomotion tasks.

RL + Model-based Control: Achieving Versatile Legged Locomotion

The paper presents a novel control framework that synergistically integrates model-based optimal control (MBOC) with reinforcement learning (RL), to achieve robust and versatile legged locomotion. As indicated, the framework leverages on-demand optimal control to generate reference motions over a spectrum of velocities and gaits, which RL policies subsequently imitate. This approach is aimed at overcoming limitations traditionally associated with MBOC, such as constraints arising from simplifying assumptions, while maintaining the efficacy and adaptability of acquired RL policies.

The authors propose that dynamic locomotion qualities can be effectively achieved by synthesizing reference motions via finite-horizon optimal control methods. These reference motions are then utilized as benchmarks during RL training, which involves policies that learn to mimic the provided motion cues under varied circumstances. Importantly, by integrating realistic simulation data, the RL methods sidestep typical constraints imposed by the simplifications inherent in model-based approaches. Through extensive validation, the paper demonstrates that the framework reliably supports diverse gait patterns and velocity adjustments without necessitating complex reward functions or hyperparameter tuning specific to robot morphology.

One of the notable numerical results is the framework's enhancement of control policy reliability, illustrated through its reduced fall rate when faced with complex locomotive tasks. Furthermore, the capability of RL policies to generalize beyond the limitations of the simplified model by incorporating full-body dynamics, such as executing stable maneuvers on uneven terrain, is highlighted as a distinct advantage. The paper also emphasizes the computational efficiency of the proposed approach, leading to fast policy training times while supporting robust adaptive control for various robot sizes.

Despite employing a Variable Height Inverted Pendulum Model (VHIPM) simplification, the paper effectively trains policies that are resilient to perturbations and adaptable to various terrains. As part of their comprehensive experiments, the authors demonstrate the training of RL policies that orchestrate a diverse set of gaits, including trotting, pronking, and galloping, across two distinct quadrupreal robots, Unitree Go1 and Unitree Aliengo. The successful transfer to real-world robotic platforms attests to the framework's potential for sim-to-real application.

Implications of this work are substantial for practical robotics, promoting the development of more adaptable and resilient robotic systems capable of dynamic locomotion across heterogeneous environments. Theoretically, the synergistic model proposes a paradigm where the strengths of MBOC and RL yield dynamic controllers that are both computationally scalable and physically implementable. Future work could explore the incorporation of higher-fidelity dynamic models and expand to incorporate different locomotor strategies beyond periodic gaits, thus enhancing the spectrum of achievable robotic behaviors. This paper adds significant insight into the evolving discourse on marrying model-based and learning-based control methods in legged robotics.

PDF Markdown

Related Papers

Tweets

https://twitter.com/i_yuxek/status/1782775573497950449

YouTube

Show All Videos