Sim-to-Real Learning of All Common Bipedal Gaits via Periodic Reward Composition (2011.01387v2)

Published 2 Nov 2020 in cs.RO

Abstract: We study the problem of realizing the full spectrum of bipedal locomotion on a real robot with sim-to-real reinforcement learning (RL). A key challenge of learning legged locomotion is describing different gaits, via reward functions, in a way that is intuitive for the designer and specific enough to reliably learn the gait across different initial random seeds or hyperparameters. A common approach is to use reference motions (e.g. trajectories of joint positions) to guide learning. However, finding high-quality reference motions can be difficult and the trajectories themselves narrowly constrain the space of learned motion. At the other extreme, reference-free reward functions are often underspecified (e.g. move forward) leading to massive variance in policy behavior, or are the product of significant reward-shaping via trial-and-error, making them exclusive to specific gaits. In this work, we propose a reward-specification framework based on composing simple probabilistic periodic costs on basic forces and velocities. We instantiate this framework to define a parametric reward function with intuitive settings for all common bipedal gaits - standing, walking, hopping, running, and skipping. Using this function we demonstrate successful sim-to-real transfer of the learned gaits to the bipedal robot Cassie, as well as a generic policy that can transition between all of the two-beat gaits.

Authors (4)

Jonah Siekmann (4 papers)
Yesh Godse (2 papers)
Alan Fern (60 papers)
Jonathan Hurst (15 papers)

Citations (138)

View on Semantic Scholar

Summary

The paper introduces a novel framework using periodic reward composition to enable sim-to-real transfer of common bipedal gaits.
It replaces traditional time indices with cycle time and phase indicator functions to capture the cyclic nature of locomotion without reference trajectories.
Experimental results on the Cassie robot demonstrate stable gait transitions and robust adaptation in dynamic, unstructured environments.

Overview of Sim-to-Real Learning of All Common Bipedal Gaits via Periodic Reward Composition

This paper by Siekmann et al. proposes a reinforcement learning (RL) framework for simulating and realizing bipedal locomotion gaits on real-world robots. The authors address the challenge posed by previous methods that rely heavily on reference trajectories to define specific gait patterns. These trajectories, although useful in reducing ambiguity, significantly constrain the exploration space in motion learning, making it difficult for robots to adapt flexibly to dynamic environments. Reference-free methodologies often fail due to underspecification, resulting in varied and often unintended behavior.

The authors introduce an innovative method based on periodic reward composition, which describes gaits using probabilistic costs targeted at essential motion variables such as forces and velocities. The periodic nature of bipedal locomotion is captured using cyclic time intervals, enabling the definition of all common bipedal gaits, including walking, skipping, and hopping, without reliance on predefined motion paths. This approach allows for flexible adaptation of learned gaits to real environments while ensuring the characteristic periodic structure of each gait is preserved.

Key Methodological Contributions

Reward Framework: The reward specification framework is novel in its use of periodic probabilistic costs applied to fundamental motion metrics such as foot forces and velocities. This unique composition is instantiated to achieve learning in a full spectrum of bipedal gaits.
Cycle Time Usage: The authors replace monotonously increasing time indices with cycle time variables. This facilitates a periodic and structured approach to gait composition, conducive to teaching cyclic gait patterns like walking and hopping.
Phase Indicator Functions: Modeling the onset and termination of phases using Von Mises distributions defines a probabilistic framework, allowing for uncertainty and environmental disturbances in gait pattern learning.
Cycle Offset Parameters: Novel parameters such as $\theta_\text{left}$ and $\theta_\text{right}$ coherently adjust the phase timing of the left and right limbs, enabling complex behaviors like walking and galloping to emerge naturally during training.

Experimental Setup and Results

The experiments utilize a simulated environment for initial policy learning followed by deployment on the bipedal robot Cassie. The framework allows not only for learning individual gait patterns but also for transitioning between gaits dynamically, executing actions like hopping onto elevated surfaces and running across uneven terrains. The numerical results indicate stable gait transitions and robust physical behavior, even in unstructured outdoor environments.

Implications for Future AI Developments

The proposed framework heralds advancements in autonomous robotic locomotion, particularly bipedal gait synthesis without rigid trajectory dependence. It exemplifies potential adaptability improvements in real-time autonomous systems, enabling robots to better navigate dynamic and unpredictable terrains. Future directions may explore extending this framework to quadrupedal locomotion or further refining the model to incorporate aperiodic and adaptive motion responses.

Conclusion

This paper contributes notably to reinforcement learning approaches for robotic locomotion, introducing theoretically sound methods to specify and learn periodic gait patterns in a manner robust to real-world variance. The research demonstrates significant progression in sim-to-real transferability, marking substantial steps towards autonomy in legged robots capable of executing complex motion tasks. Overall, this paper presents a balanced blend of empirical robustness and theoretical advancement in the domain of robotic gait synthesis.

PDF Markdown

Related Papers

YouTube

Show All Videos