Sim-to-Real: Learning Agile Locomotion For Quadruped Robots (1804.10332v2)

Published 27 Apr 2018 in cs.RO and cs.AI

Abstract: Designing agile locomotion for quadruped robots often requires extensive expertise and tedious manual tuning. In this paper, we present a system to automate this process by leveraging deep reinforcement learning techniques. Our system can learn quadruped locomotion from scratch using simple reward signals. In addition, users can provide an open loop reference to guide the learning process when more control over the learned gait is needed. The control policies are learned in a physics simulator and then deployed on real robots. In robotics, policies trained in simulation often do not transfer to the real world. We narrow this reality gap by improving the physics simulator and learning robust policies. We improve the simulation using system identification, developing an accurate actuator model and simulating latency. We learn robust controllers by randomizing the physical environments, adding perturbations and designing a compact observation space. We evaluate our system on two agile locomotion gaits: trotting and galloping. After learning in simulation, a quadruped robot can successfully perform both gaits in the real world.

Authors (8)

Jie Tan (85 papers)
Tingnan Zhang (53 papers)
Erwin Coumans (17 papers)
Atil Iscen (18 papers)
Yunfei Bai (21 papers)
Danijar Hafner (32 papers)
Steven Bohez (18 papers)
Vincent Vanhoucke (29 papers)

Citations (746)

View on Semantic Scholar

Summary

Sim-to-Real: Learning Agile Locomotion For Quadruped Robots

In the paper "Sim-to-Real: Learning Agile Locomotion For Quadruped Robots" authored by Jie Tan et al., the researchers address the challenge of automating the design of agile locomotion for quadruped robots using deep reinforcement learning (deep RL). This task traditionally demands extensive expertise and manual tuning. The paper ventures to simplify and streamline this process by learning locomotion policies in simulation and effectively transferring these policies to real-world robots.

The primary contributions of this research are threefold:

The presentation of a complete system that learns agile locomotion policies in a simulated environment and transfers these policies to physical robots.
The implementation of various methods to narrow the simulation-to-reality (sim-to-real) gap, enabling the successful deployment of simulated policies on real robots.
The demonstration of the effectiveness of these methodologies on two specific locomotion gaits: trotting and galloping.

System Framework

The proposed system can learn quadruped locomotion from scratch with simple reward signals. Users also have the option to guide the learning process through an open-loop reference, which helps to control the learned gait. The system leverages Proximal Policy Optimization (PPO) for robust and stable policy updates, conducted in a physics simulator (PyBullet).

The learned policies include both components of open-loop references and feedback-based adjustments, providing a spectrum of user controllability over the learned gait. Through system identification, and enhanced actuator and latency models, the researchers improve the fidelity of the physics simulator. This narrows the reality gap, a significant bottleneck in transferring policies from simulated environments to physical systems.

Methodologies to Narrow the Reality Gap

System Identification and Enhanced Simulation

A detailed system identification process is employed to create an accurate Unified Robot Description Format (URDF) file for the simulated robot, Minitaur. Important physical attributes like mass, dimensions, actuator dynamics, and motor friction are precisely measured and incorporated into the simulator.

To further reduce the discrepancy between the simulated and real environment, the researchers introduce a new actuator model aligned with the dynamics of DC motors. They also incorporate latency modeling, acknowledging the time delays inherent in sensory feedback and command execution in physical systems. These improvements significantly enhance the reliability and safety of deploying simulated policies in the real world.

Robust Control Policies

Robustness in control policies is achieved by randomizing key physical parameters (e.g., mass, motor strength, and control latency) during training. Additionally, the system includes the injection of random perturbations to the bot's base, encouraging the learning process to develop resilience to disturbances and model inaccuracies. This results in control policies that demonstrate consistent performance across varied simulations and real-world scenarios.

The observation space design also plays a crucial role. By adopting a compact observation space, the researchers avoid overfitting to simulated environments, thereby rendering the learned policies more transferable to the physical robot.

Experimental Evaluation

The researchers detail two primary experiments: learning to gallop and learning to trot.

Galloping: When training policies for galloping without any human-provided references, the system successfully learned this high-speed gait, achieving speeds of 1.34 m/s in simulation and 1.18 m/s in the real world.
Trotting: For the trotting gait, a predefined open-loop signal was provided to guide the learning process. This hybrid learning approach resulted in a stable trotting gait with speeds of 0.60 m/s in the real world.

These experimental results are significant when compared to handcrafted gaits developed by experts. The learned gaits not only perform comparably in terms of speed but also exhibit markedly superior energy efficiency—reducing power consumption by 35% and 23% for galloping and trotting respectively.

Discussion and Implications

The rigorous testing and comprehensive evaluations underscore the efficacy of the proposed methodologies in bridging the sim-to-real gap. When applied to physical systems, the improved simulation fidelity and robust policy learning techniques ensure that learned policies sustain high performance and adaptability.

This work sets a strong foundation for future research focusing on dynamic locomotion tasks and interaction with complex terrains. Integrating vision into the sensory input to enhance environmental awareness and navigation capabilities is a logical next step. Additionally, learning policies that dynamically adjust speed and direction in response to the environment could further advance the field of quadruped robot locomotion.

In conclusion, the paper introduces a systematic and efficient approach to learning agile locomotion for quadruped robots, demonstrating the potential for autonomous systems to achieve high performance through deep RL while minimizing reliance on manual design and tuning.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos