Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning (1803.09956v3)

Published 27 Mar 2018 in cs.RO, cs.AI, cs.CV, cs.LG, and stat.ML

Abstract: Skilled robotic manipulation benefits from complex synergies between non-prehensile (e.g. pushing) and prehensile (e.g. grasping) actions: pushing can help rearrange cluttered objects to make space for arms and fingers; likewise, grasping can help displace objects to make pushing movements more precise and collision-free. In this work, we demonstrate that it is possible to discover and learn these synergies from scratch through model-free deep reinforcement learning. Our method involves training two fully convolutional networks that map from visual observations to actions: one infers the utility of pushes for a dense pixel-wise sampling of end effector orientations and locations, while the other does the same for grasping. Both networks are trained jointly in a Q-learning framework and are entirely self-supervised by trial and error, where rewards are provided from successful grasps. In this way, our policy learns pushing motions that enable future grasps, while learning grasps that can leverage past pushes. During picking experiments in both simulation and real-world scenarios, we find that our system quickly learns complex behaviors amid challenging cases of clutter, and achieves better grasping success rates and picking efficiencies than baseline alternatives after only a few hours of training. We further demonstrate that our method is capable of generalizing to novel objects. Qualitative results (videos), code, pre-trained models, and simulation environments are available at http://vpg.cs.princeton.edu

Authors (6)

Andy Zeng (54 papers)
Shuran Song (110 papers)
Stefan Welker (9 papers)
Johnny Lee (12 papers)
Alberto Rodriguez (79 papers)
Thomas Funkhouser (66 papers)

Citations (532)

View on Semantic Scholar

Summary

The paper shows that joint training of pushing and grasping using self-supervised deep reinforcement learning boosts grasp success in cluttered scenes.
The methodology uses fully convolutional networks within a Q-learning framework to map pixel-wise observations to action utilities.
Experimental results reveal improved action efficiency and robust generalization on novel objects in both simulation and real-world tests.

Overview of the Paper

The paper, "Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning," explores the intersection of non-prehensile and prehensile robotic manipulation actions using deep reinforcement learning. The research demonstrates that it is feasible to learn the complex synergies between pushing (non-prehensile) and grasping (prehensile) from a self-supervised learning framework. This is particularly relevant in cluttered scenarios where coordination between these actions is critical for efficient robotic manipulation.

Methodology

The authors propose a novel approach by training two fully convolutional networks (FCNs) within a Q-learning framework. These networks predict the utility of potential pushing and grasping actions from visual observations. The pushing and grasping policies are learned jointly, and the system is trained through trial and error, leveraging successful grasps as rewards.

Key aspects of the method include:

Joint Training: The networks assess possible actions by mapping from pixel-wise visual observations to Q-values, which indicate the expected future rewards for pushing and grasping.
Self-supervised Learning: Training is conducted without explicit external supervision, relying instead on intrinsic rewards from the environment to guide learning.
Visual Representation: By parameterizing actions with respect to visual pixel-wise data, the method leverages visual affordances directly, streamlining the perception-to-action pipeline.

Experiments and Results

The research evaluates the proposed method both in simulation environments and real-world scenarios. Comparative tests were conducted against baseline methods that use either grasping-only policies or separate pushing and grasping policies. The combination of pushing and grasping policies achieved superior performance metrics, namely:

Grasp Success Rate: The proposed method achieved higher grasp success rates than baseline methods, especially in cluttered environments.
Action Efficiency: Results indicated greater efficiency in object manipulation as the method learned to execute more effective sequences of actions.
Generalization: The system was evaluated on novel objects and demonstrated an ability to generalize beyond the training set, successfully manipulating previously unseen objects.

Implications and Future Work

The findings hold practical implications for the development of autonomous robotic systems capable of operating in dynamic and cluttered environments. By showcasing that pushing can enhance grasping success, the research opens avenues for more complex manipulation tasks in real-world applications such as warehouse automation and domestic robotics.

The paper leaves open several avenues for further exploration:

Extended Action Sets: Future research may expand the approach to include a broader set of manipulation actions, such as rolling or toppling.
Improved Model Expressiveness: Investigating different model architectures or representations that allow for more dynamic and expressive actions could be beneficial.
Increased Generalization: Further testing and training on a wider array of object shapes and sizes could improve the system's generalizability.

In summary, this research contributes robust insights into the synergies of robotic manipulation actions, with the developed methods demonstrating strong potential to enhance efficiency and adaptability in autonomous systems.

PDF Markdown

Related Papers

YouTube

Show All Videos