Learning Dexterous In-Hand Manipulation (1808.00177v5)

Published 1 Aug 2018 in cs.LG, cs.AI, cs.RO, and stat.ML

Abstract: We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand. The training is performed in a simulated environment in which we randomize many of the physical properties of the system like friction coefficients and an object's appearance. Our policies transfer to the physical robot despite being trained entirely in simulation. Our method does not rely on any human demonstrations, but many behaviors found in human manipulation emerge naturally, including finger gaiting, multi-finger coordination, and the controlled use of gravity. Our results were obtained using the same distributed RL system that was used to train OpenAI Five. We also include a video of our results: https://youtu.be/jwSbzNHGflM

Citations (1,753)

View on Semantic Scholar

Summary

The paper demonstrates that reinforcement learning and extensive simulation calibration enable a Shadow Dexterous Hand to achieve a median of 13 successful consecutive rotations in in-hand manipulation tasks.
The paper leverages a recurrent neural network with PPO and comprehensive domain randomization to ensure robust transfer of policies from simulation to the real world.
The paper integrates vision-based pose estimation via CNNs, underscoring the path toward more generalized and sophisticated robotic dexterity in practical applications.

Learning Dexterous In-Hand Manipulation

Introduction

The challenge of robotic dexterous in-hand manipulation represents a significant hurdle in the field of autonomous robotics, particularly due to the complex nature of control required for general-purpose manipulators. This paper by OpenAI introduces a reinforcement learning (RL) approach for training policies that enable a five-fingered humanoid hand, specifically the Shadow Dexterous Hand, to perform vision-based object reorientation tasks. The trained policies are shown to successfully transfer from a simulated environment to the physical robot, despite being trained entirely in simulation, thus highlighting the efficacy of domain randomization and extensive simulation calibration.

System Overview

The framework developed in this work encompasses both a control policy for the manipulation task and a vision-based pose estimator. The control policy is trained using the Proximal Policy Optimization (PPO) algorithm, which asynchronously optimizes a recurrent neural network. Key to the policy’s success is its adaptation to a variety of simulated environments with random physical parameters and noise models, thereby ensuring robustness when transferred to the real world.

Control Policy Training

The control policy operates in a simulated environment where variability in physical parameters such as friction coefficients and object appearances is introduced. The policy utilizes a combination of recurrent neural networks (RNNs) and deep reinforcement learning, leveraging an LSTM architecture to handle the temporal dependencies and dynamic variability inherent in dexterous manipulation tasks. The use of memory in the policy allows for adaptive behaviors based on the real-time state of the environment, which proves crucial for tasks involving intricate physical interactions.

Vision-Based Pose Estimation

Complementary to the control policy, the vision-based pose estimator utilizes a convolutional neural network (CNN) to predict object poses from images captured by three RGB cameras. Given the randomization in object appearances and lighting within the simulated environment, the pose estimator is trained solely on synthetic data generated by Unity and MuJoCo simulators. The estimator provides the object pose to the control policy, enabling effective manipulation without reliance on physical markers.

Results

The trained policies demonstrate a notable degree of dexterity in manipulating objects, with emergent behaviors including finger gaiting, multi-finger coordination, and strategic use of gravity, resembling human manipulation skills. The success of these policies in the physical world underscores the effectiveness of domain randomization and rigorous simulation calibration.

Quantitative Performance

The performance metrics indicate that the control policies achieve a median of 13 successful consecutive rotations when manipulating a block, showcasing a substantial reduction in the reality gap. Furthermore, the inception of vision-based pose estimation shows marginally lower performance compared to motion capture-based state estimation, yet it still achieves practical efficacy.

Ablation Studies

A comprehensive ablation paper elucidates the significance of various randomizations. Policies trained without randomizations or with less extensive randomizations exhibit poor transfer performance, underlining the necessity of thorough domain randomization protocols. These findings coincide with previous research insights but are extended by meticulous empirical validation.

Implications and Future Work

This paper provides a concrete demonstration of how advanced RL methods, coupled with extensive simulation and domain randomization, can achieve complex robotic manipulation skills transferable to real-world settings. The demonstrative use of a high-DOF humanoid hand for in-hand manipulation opens avenues for deploying similar techniques in varied robotics applications, from industrial automation to service robotics.

Future work may focus on enhancing robustness through continued refinement of domain randomization, exploring multi-object manipulation, and further integrating tactile sensing to augment the sensory feedback. Additionally, the widespread applicability of these methods to various robotic platforms could pave the way for generalized skill transfer across different robotic morphologies and tasks.

Conclusion

The research encapsulated in this paper marks a significant stride towards realizing sophisticated, real-world robot manipulation through simulated training. By harnessing reinforcement learning, domain randomization, and large-scale distributed training, the authors have demonstrated that intricate in-hand manipulation tasks, once considered beyond the capability of traditional control methods, are now achievable, setting a robust foundation for future advancements in robotic dexterity.

PDF Markdown

Related Papers

YouTube

Show All Videos