Sim-to-Real Transfer of Robotic Control with Dynamics Randomization (1710.06537v3)

Published 18 Oct 2017 in cs.RO and cs.SY

Abstract: Simulations are attractive environments for training agents as they provide an abundant source of data and alleviate certain safety concerns during the training process. But the behaviours developed by agents in simulation are often specific to the characteristics of the simulator. Due to modeling error, strategies that are successful in simulation may not transfer to their real world counterparts. In this paper, we demonstrate a simple method to bridge this "reality gap". By randomizing the dynamics of the simulator during training, we are able to develop policies that are capable of adapting to very different dynamics, including ones that differ significantly from the dynamics on which the policies were trained. This adaptivity enables the policies to generalize to the dynamics of the real world without any training on the physical system. Our approach is demonstrated on an object pushing task using a robotic arm. Despite being trained exclusively in simulation, our policies are able to maintain a similar level of performance when deployed on a real robot, reliably moving an object to a desired location from random initial configurations. We explore the impact of various design decisions and show that the resulting policies are robust to significant calibration error.

PDF Abstract

Sim-to-Real Transfer of Robotic Control with Dynamics Randomization

The paper "Sim-to-Real Transfer of Robotic Control with Dynamics Randomization" by Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel presents a robust methodology for transferring policies trained in simulation to real-world robotic systems. The authors address the prevalent challenge known as the "reality gap," where discrepancies between simulated and real-world environments hinder the direct application of simulation-trained policies in physical systems.

Overview and Methodology

The central contribution of this paper is the use of dynamics randomization to train adaptive policies in simulation, which can generalize to the dynamics of the real world. This is achieved by diversifying the dynamics parameters within the simulation during training, thus preparing the policy to handle a broad range of variations that it might encounter when deployed in the real world.

The approach focuses on a non-prehensile manipulation task, specifically pushing an object with a 7-DOF robotic arm. The policies were trained using deep reinforcement learning (DeepRL) within simulated environments provided by MuJoCo physics engine. Various physical properties, including link mass, joint damping, puck friction, table height, and controller gains, were randomized within specified ranges during training. Additionally, timestep variations between actions and observation noise were also introduced to model real-world uncertainties.

Policy and Network Design

The paper compares multiple neural network architectures for policy representation:

LSTM-Based Recurrent Policies: These policies utilize Long Short-Term Memory (LSTM) networks to maintain an internal state that encapsulates dynamics over time, enabling the policy to infer and adapt to varying environmental dynamics.
Feedforward Networks (FF): Basic non-recurrent policies that react purely based on the current observation without considering the history of states and actions.
Feedforward Networks with History (FF + Hist): These policies are provided with a history of previous states and actions, enhancing their ability to infer underlying dynamics compared to basic feedforward networks.

The recurrent policies, particularly those using LSTM, demonstrated superior performance and robustness both in simulation and when deployed on the real robotic system. The recurrent approach's ability to handle dynamic variability without explicit system identification suggests a significant advancement in policy generalization.

Empirical Results

The empirical results provide rigorous comparisons between different policy architectures. The LSTM-based policies exhibited a success rate of 0.91 in simulation and 0.89 in real-world trials, demonstrating minimal performance drop during transfer. In contrast, feedforward policies trained without dynamics randomization failed to generalize, achieving only a 0.51 success rate in simulation and no successful trials in the real world.

Ablation studies further highlighted the importance of each randomized parameter. Removing randomization in action timesteps or observation noise significantly degraded the performance, underscoring their critical roles in preparing the policies for real-world deployment.

Implications and Future Work

The findings in this paper suggest that dynamics randomization is an effective strategy for mitigating the reality gap in robotic control tasks. The successful transfer of policies trained exclusively in simulation to a physical robot without additional real-world calibration represents a considerable step forward. This methodology could significantly reduce development costs and time, leveraging the efficiency of simulated training environments.

Future research could extend this framework to a broader array of robotic tasks, incorporating diverse sensory modalities such as vision and tactile feedback. Additionally, optimizing the balance between simulation fidelity and randomization could further enhance the robustness and applicability of trained policies.

Dynamics randomization, coupled with sophisticated policy architectures like LSTM, demonstrates a compelling pathway for robust sim-to-real transfer in robotics. This approach promises to expand the practical deployment of advanced robotic capabilities across various domains.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Xue Bin Peng (52 papers)
Marcin Andrychowicz (22 papers)
Wojciech Zaremba (34 papers)
Pieter Abbeel (372 papers)

Citations (1,251)

View on Semantic Scholar

Sim-to-Real Transfer of Robotic Control with Dynamics Randomization (1710.06537v3)