Reinforcement Learning for Robust Parameterized Locomotion Control of Bipedal Robots

Published 26 Mar 2021 in cs.RO, cs.AI, cs.LG, cs.SY, and eess.SY | (2103.14295v1)

Abstract: Developing robust walking controllers for bipedal robots is a challenging endeavor. Traditional model-based locomotion controllers require simplifying assumptions and careful modelling; any small errors can result in unstable control. To address these challenges for bipedal locomotion, we present a model-free reinforcement learning framework for training robust locomotion policies in simulation, which can then be transferred to a real bipedal Cassie robot. To facilitate sim-to-real transfer, domain randomization is used to encourage the policies to learn behaviors that are robust across variations in system dynamics. The learned policies enable Cassie to perform a set of diverse and dynamic behaviors, while also being more robust than traditional controllers and prior learning-based methods that use residual control. We demonstrate this on versatile walking behaviors such as tracking a target walking velocity, walking height, and turning yaw.

Abstract PDF Upgrade to Chat

Citations (189)

View on Semantic Scholar

Summary

The paper introduces a reinforcement learning framework that overcomes model-based limitations to robustly control bipedal robots.
It employs a diverse gait library with domain randomization to train controllers adaptable to different speeds, heights, and turning maneuvers.
Experimental results on the Cassie robot show that RL controllers outperform traditional HZD methods in resilience and dynamic performance.

Reinforcement Learning for Robust Parameterized Locomotion Control of Bipedal Robots

This paper introduces a novel approach to controlling bipedal robots, emphasizing a model-free reinforcement learning (RL) framework for designing robust locomotion policies. The research targets the challenge of bipedal locomotion, which involves managing complex dynamics and numerous degrees-of-freedom. The traditional model-based techniques often fail to adapt to variations due to their dependence on precise modeling, leading to unstable behaviors when slight deviations occur. This study focuses on overcoming these limitations by leveraging RL.

Methodology and Contribution

The authors developed a reinforcement learning-based system that utilizes a diverse gait library to train locomotion controllers in simulated environments. These trained controllers are then effectively transferred to a real-world setup, specifically a Cassie bipedal robot. Key to this sim-to-real transition is the utilization of domain randomization, which is designed to help the RL policies generalize across a range of possible environmental dynamics. The RL policies are engineered to imitate parameterized gaits derived from Hybrid Zero Dynamics (HZD), allowing the robot to adapt to various velocities, heights, and turning rates.

Technical Insights

The paper meticulously defines the action and state spaces for the Cassie robot's control architecture. Actions correspond to desired motor positions, integrated through low-pass filtering for smoother performance. The observable system state includes recent motor positions and velocities, providing a historical context essential for effective control. Learning is guided by a designed reward structure incentivizing accurate tracking of reference motions while promoting energy efficiency and minimizing contact force disturbances.

The learning process itself is fine-tuned through Proximal Policy Optimization (PPO), which is adept at stabilizing training in environments with high-dimensional spaces like those found in bipedal robots. The researchers augment the simulation with domain randomization, a key component that introduces controlled variability in dynamic properties, such as ground friction and sensor noise, enhancing policy robustness and facilitating successful real-world application.

Results and Evaluation

The developed control policies exhibit significantly increased robustness and adaptability compared to traditional HZD-based methods. Testing in both simulated environments and real-world scenarios demonstrates that RL-based controllers not only support diverse locomotion tasks—including changes in walking speed, direction, and height—but also exhibit resilience to external perturbations such as unknown load or varying friction surfaces. The experiment section details these capabilities, showcasing robust recovery from external perturbations, and adaptability to unmodeled conditions, outperforming prior benchmarks and standard controllers.

Implications and Future Work

This study implies a significant shift in robotic locomotion control strategies, indicating a promising future for RL-enabled robots that can autonomously adapt to complex and dynamic environments. The integration of RL with parameterized control signals and sim-to-real methodologies suggests a pathway towards more autonomous, resilient, and versatile robotic systems.

Future research directions may include expanding the diversity of behavioral training in the gait libraries and enhancing adaptability in unpredictable or drastically changing environments. Additionally, integrating more sophisticated sensors and actuators could further enable robots to perform more complex tasks and environments. Overall, this paper provides a substantive step towards more autonomous and adaptable robotic systems using reinforcement learning.

Markdown