Learning agile and dynamic motor skills for legged robots (1901.08652v1)

Published 24 Jan 2019 in cs.RO, cs.LG, and stat.ML

Abstract: Legged robots pose one of the greatest challenges in robotics. Dynamic and agile maneuvers of animals cannot be imitated by existing methods that are crafted by humans. A compelling alternative is reinforcement learning, which requires minimal craftsmanship and promotes the natural evolution of a control policy. However, so far, reinforcement learning research for legged robots is mainly limited to simulation, and only few and comparably simple examples have been deployed on real systems. The primary reason is that training with real robots, particularly with dynamically balancing systems, is complicated and expensive. In the present work, we introduce a method for training a neural network policy in simulation and transferring it to a state-of-the-art legged system, thereby leveraging fast, automated, and cost-effective data generation schemes. The approach is applied to the ANYmal robot, a sophisticated medium-dog-sized quadrupedal system. Using policies trained in simulation, the quadrupedal machine achieves locomotion skills that go beyond what had been achieved with prior methods: ANYmal is capable of precisely and energy-efficiently following high-level body velocity commands, running faster than before, and recovering from falling even in complex configurations.

Authors (7)

Jemin Hwangbo (20 papers)
Joonho Lee (104 papers)
Alexey Dosovitskiy (49 papers)
Dario Bellicoso (2 papers)
Vassilios Tsounis (5 papers)
Vladlen Koltun (114 papers)
Marco Hutter (165 papers)

Citations (1,172)

View on Semantic Scholar

Summary

The paper introduces a reinforcement learning framework that trains neural network policies for agile, energy-efficient locomotion on quadrupedal robots.
It combines high-fidelity simulation with learned actuator dynamics, reducing linear velocity error by 0.143 m/s and lowering torque and power consumption.
The approach enables high-speed locomotion up to 1.5 m/s and autonomous recovery from falls, surpassing previous model-based methods.

Learning Agile and Dynamic Motor Skills for Legged Robots

The paper Learning Agile and Dynamic Motor Skills for Legged Robots by Jemin Hwangbo et al., addresses a critical challenge in robotics: the development of dynamic and agile locomotion capabilities for legged robots. The research leverages the power of reinforcement learning (RL) to train neural network policies in simulation and transfer them to physical quadrupedal platforms, specifically the ANYmal robot.

Overview

Legged robots offer significant advantages over wheeled or tracked robots, especially in complex, unstructured environments. However, designing efficient control algorithms for these systems is fraught with challenges due to their high-dimensional, non-smooth dynamics and the need for meticulous tuning of control parameters. Traditional methods, including modular controllers and trajectory optimization, fall short in addressing these complexities, often requiring extensive manual design and tuning efforts for each maneuver or environment change.

The authors propose an innovative approach that combines model-based simulation with data-driven reinforcement learning to autonomously learn and deploy motor control policies. By training in a high-fidelity simulation environment and implementing learned policies on an actual robot, they address the gap between simulated and real-world applications.

Methodology

The methodology involves a hybrid simulation environment that merges analytical models of rigid-body dynamics with learned models of actuator dynamics. Key steps in their approach include:

Physical Parameter Identification: Estimation of robot parameters and uncertainties.
Actuator Network Training: Learning a deep neural network (actuator net) to model the complex dynamics of actuators, covering software delays and mechanical characteristics.
Policy Training: Using the trained actuator net and high-fidelity simulation, reinforcement learning is employed to train the control policy.
Deployment: Direct deployment of the RL-trained policy on the physical ANYmal robot.

The control policy is represented by a multi-layer perceptron that maps the robot’s state history to joint position targets, ensuring the policy learns to produce realistic and executable motor commands.

Results

Command-Conditioned Locomotion

The learned controller enables ANYmal to follow high-level body velocity commands with high precision and energy efficiency. In tests, it significantly outperformed the best existing model-based controller for ANYmal, demonstrating:

An average linear velocity error of 0.143 m/s, substantially lower than the model-based controller’s error.
Reduced torque and mechanical power consumption by 29.7% and 19.8%, respectively.

High-Speed Locomotion

The high-speed locomotion policy pushed the boundaries of ANYmal’s performance, allowing it to reach speeds of up to 1.5 m/s, surpassing the previous speed record by 25%. The policy efficiently utilized the hardware's full potential, maintaining performance within the robot’s maximum torque and velocity limits.

Recovery from Falls

Perhaps most impressively, the paper describes the development of a recovery policy enabling ANYmal to autonomously recover from a fall. The learned policy successfully handled complex initial configurations and dynamic motions, such as flipping from an upside-down position. This capability has not been achieved by previous methods for robots of comparable complexity.

Implications

The implications of this research are robust and far-reaching. By automating the learning of complex motor skills, the approach reduces the need for extensive manual tuning and domain-specific model design. This significantly shortens the development time for new maneuvers and makes the deployment of agile and versatile legged robots more feasible for various real-world applications, from search and rescue operations to planetary exploration.

Future Directions

Future developments could focus on generalizing the learned policies to more diverse environments and tasks, potentially incorporating hierarchical policy structures to handle multiple tasks within a single framework. Additionally, extending the methodology to other robot platforms and actuator types will test its generalizability and robustness further.

The paper stands as a substantial contribution to the field of robotics, advancing the state-of-the-art in autonomous control policy development for legged robots using reinforcement learning. The combination of simulation fidelity, learned actuator dynamics, and efficient policy deployment presents a promising avenue for future research and application.

PDF Markdown

Related Papers

Tweets

https://twitter.com/TheRealFergs/status/1817934747198276088