Iterative Reinforcement Learning Based Design of Dynamic Locomotion Skills for Cassie (1903.09537v1)

Published 22 Mar 2019 in cs.RO

Abstract: Deep reinforcement learning (DRL) is a promising approach for developing legged locomotion skills. However, the iterative design process that is inevitable in practice is poorly supported by the default methodology. It is difficult to predict the outcomes of changes made to the reward functions, policy architectures, and the set of tasks being trained on. In this paper, we propose a practical method that allows the reward function to be fully redefined on each successive design iteration while limiting the deviation from the previous iteration. We characterize policies via sets of Deterministic Action Stochastic State (DASS) tuples, which represent the deterministic policy state-action pairs as sampled from the states visited by the trained stochastic policy. New policies are trained using a policy gradient algorithm which then mixes RL-based policy gradients with gradient updates defined by the DASS tuples. The tuples also allow for robust policy distillation to new network architectures. We demonstrate the effectiveness of this iterative-design approach on the bipedal robot Cassie, achieving stable walking with different gait styles at various speeds. We demonstrate the successful transfer of policies learned in simulation to the physical robot without any dynamics randomization, and that variable-speed walking policies for the physical robot can be represented by a small dataset of 5-10k tuples.

Authors (6)

Zhaoming Xie (14 papers)
Patrick Clary (4 papers)
Jeremy Dao (14 papers)
Pedro Morais (6 papers)
Jonathan Hurst (15 papers)
Michiel van de Panne (30 papers)

Citations (67)

View on Semantic Scholar

Summary

Iterative Reinforcement Learning-Based Design of Dynamic Locomotion Skills for Cassie

The paper entitled "Iterative Reinforcement Learning-Based Design of Dynamic Locomotion Skills for Cassie" addresses a significant challenge in the application of reinforcement learning to legged robotics: the difficulty of iterative design with stable policy improvement. The authors propose a novel framework combining deep reinforcement learning (DRL) with a deterministic action-stochastic state (DASS) approach, enhancing policy adaptation and transfer from simulations to physical robots without the need for dynamic randomization.

Overview and Contributions

Building effective control policies for bipedal robots poses unique challenges due to their dynamic instability and structural complexity. The Cassie bipedal robot presents the authors with a practical platform to demonstrate their solutions. The core innovation presented involves the iterative refinement of policies through a combination of reinforcement learning and imitation learning, specifically addressing the limitations in legacy approaches to policy adjustment and adaptation.

The primary contributions of this work are:

DASS Technique for Policy Distillation and Compression: The authors introduce a mechanism to reconstruct policies from a minimal number of samples, leveraging DASS tuples. This allows for effective distillation of multiple policies into a singular, optimized policy representation without significant degradation in performance.
Iterative Policy Design Framework: Instead of static policy architectures, the paper employs an adaptable framework where the reward functions can be redefined at each iteration. Mixing policy gradients with distillation updates from DASS samples supports the retention of effective behaviors while enabling policy evolution.
Successful Sim-to-Real Transfer of Locomotion Policies: Policies trained in simulation were directly transferrable to the Cassie robot without employing dynamics randomization. This is a marked improvement over prevalent methodologies requiring extensive randomization for effective reality transfer, particularly for human-scale bipeds like Cassie.
Rich Empirical Evaluation: Demonstrations of various walking styles and speeds, all implemented on the physical model, indicate robustness and adaptability. The experiments underline the capacity of learned policies to handle unanticipated scenarios and disturbances in real-world environments.

Implications and Future Directions

The implications of this work are multifaceted. Practically, the demonstrated ability to iteratively refine and transfer policies from simulations to a physical robot can lead to reduced developmental timelines of robotic gait and movement patterns. Theoretically, the integration of supervised learning-based policy distillation with reinforcement learning enriches the capacity of models to retain learned behaviors while incorporating newfound strategies, a significant stride in both robotics and artificial intelligence domains.

Future developments in AI could leverage this framework to expand to richer perceptual inputs, such as computer vision systems, enabling adaptive navigation and interaction in unstructured environments. The blending of deterministic and stochastic paradigms put forth by DASS could also inspire alternative architectures that balance policy flexibility with deterministic guidance in other robotic applications.

In conclusion, the paper represents a substantial contribution to the reinforcement learning field's applicability to dynamic humanoid robotics, presenting a promising model for both simulation-based exploration and real-world deployment of legged locomotion skills.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos