Learning Task Space Actions for Bipedal Locomotion (2011.04741v2)

Published 9 Nov 2020 in cs.RO

Abstract: Recent work has demonstrated the success of reinforcement learning (RL) for training bipedal locomotion policies for real robots. This prior work, however, has focused on learning joint-coordination controllers based on an objective of following joint trajectories produced by already available controllers. As such, it is difficult to train these approaches to achieve higher-level goals of legged locomotion, such as simply specifying the desired end-effector foot movement or ground reaction forces. In this work, we propose an approach for integrating knowledge of the robot system into RL to allow for learning at the level of task space actions in terms of feet setpoints. In particular, we integrate learning a task space policy with a model-based inverse dynamics controller, which translates task space actions into joint-level controls. With this natural action space for learning locomotion, the approach is more sample efficient and produces desired task space dynamics compared to learning purely joint space actions. We demonstrate the approach in simulation and also show that the learned policies are able to transfer to the real bipedal robot Cassie. This result encourages further research towards incorporating bipedal control techniques into the structure of the learning process to enable dynamic behaviors.

Authors (6)

Helei Duan (9 papers)
Jeremy Dao (14 papers)
Kevin Green (12 papers)
Taylor Apgar (1 paper)
Alan Fern (60 papers)
Jonathan Hurst (15 papers)

Citations (42)

View on Semantic Scholar

Summary

Task Space Action Learning for Bipedal Locomotion

The paper "Learning Task Space Actions for Bipedal Locomotion" by Duan et al. proposes an innovative methodology for enhancing the capability of reinforcement learning (RL) to develop effective control policies for bipedal robots. Traditional RL methods focused predominantly on learning joint-level dynamics, which are often suboptimal due to the misalignment between primary locomotive control interests and joint space actions. This paper outlines a novel approach that integrates RL with a model-based inverse dynamics controller, fostering learning within the task space of the robot's feet setpoints.

Key advancements are achieved by utilizing task space actions, which inherently lead to more sample-efficient learning despite being predicated on complex bipedal mechanics. By outputting actions at the task space level, these RL policies deftly interface with inverse dynamics controllers to translate high-level task space dynamics into joint-level commands. This methodological shift encourages exploration in a more meaningful action space, thereby allowing the trained policies to achieve desired task space dynamics efficiently.

Methodology and Results

The authors employ the bipedal robot Cassie to demonstrate the viability of their method both in simulation and hardware. The core of their approach is the use of a task space policy that interacts with a model-based inverse dynamics controller to generate joint-level commands. The RL framework, particularly Proximal Policy Optimization (PPO), facilitates the learning of task space actions by leveraging both the robot's dynamic models and task space references driven by a spring mass model.

The empirical results demonstrate that learning task space actions significantly enhances sample efficiency. In simulation, the policies trained with task space actions exhibited an accelerated convergence to a functional bipedal locomotion policy, reducing wall-clock training time from 32 to 8 hours compared to a joint space learning approach. Furthermore, task space policies maintain the coherence of ground reaction forces, aligning closely with desired reference profiles and demonstrating adaptability to disturbances.

Implications and Future Directions

The implications of this research are of considerable interest to the field of robotic locomotion. The successful transfer of learned policies from simulation to real-world hardware underscores the promise of integrating task space considerations into learning algorithms. This approach not only simplifies the learning process but also effectively bridges existing model-based controllers with RL paradigms through a common language of task space actions.

The potential for further enhancement of task space learning is substantial, especially by possibly incorporating more advanced model-based control frameworks, like centroidal dynamics. This could refine the precision and robustness of bipedal locomotion policies, enabling them to accommodate a wider range of dynamic scenarios and adopt more complex locomotive behaviors.

In conclusion, the paper by Duan et al. provides a crucial step towards a systematic and effective methodology for training bipedal robots. By aligning the learning process with task space dynamics, the paper lays a foundation for future research efforts to continue exploring and refining RL-based locomotion techniques in both simulation and real-world environments.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos