Grasping in the Wild:Learning 6DoF Closed-Loop Grasping from Low-Cost Demonstrations (1912.04344v2)

Published 9 Dec 2019 in cs.CV and cs.RO

Abstract: Intelligent manipulation benefits from the capacity to flexibly control an end-effector with high degrees of freedom (DoF) and dynamically react to the environment. However, due to the challenges of collecting effective training data and learning efficiently, most grasping algorithms today are limited to top-down movements and open-loop execution. In this work, we propose a new low-cost hardware interface for collecting grasping demonstrations by people in diverse environments. Leveraging this data, we show that it is possible to train a robust end-to-end 6DoF closed-loop grasping model with reinforcement learning that transfers to real robots. A key aspect of our grasping model is that it uses "action-view" based rendering to simulate future states with respect to different possible actions. By evaluating these states using a learned value function (Q-function), our method is able to better select corresponding actions that maximize total rewards (i.e., grasping success). Our final grasping system is able to achieve reliable 6DoF closed-loop grasping of novel objects across various scene configurations, as well as dynamic scenes with moving objects.

Authors (4)

Shuran Song (110 papers)
Andy Zeng (54 papers)
Johnny Lee (12 papers)
Thomas Funkhouser (66 papers)

Citations (192)

View on Semantic Scholar

Summary

Overview of Learning 6DoF Closed-Loop Grasping

This paper presents an innovative approach for robotic grasping, focusing on learning six degrees of freedom (6DoF) closed-loop grasping from low-cost demonstrations. By employing a novel system that combines hardware for collecting data and a reinforcement learning framework, the authors have advanced the capabilities of robotic grasping in unstructured environments.

The core objective of intelligent manipulation is to enable robots to control end-effectors flexibly in 3D space while dynamically reacting to environmental changes. However, traditional grasping models mostly rely on top-down approaches and open-loop execution, thereby limiting their applicability in more complex settings. To address these limitations, the paper proposes a low-cost interface that allows humans to collect diverse grasping demonstrations, thus providing robust training data to model 6DoF closed-loop grasping.

Methodology

The authors designed a handheld device, akin to a robot's end-effector, equipped with an RGB-D camera to gather data. This setup allows participants to perform everyday tasks in a variety of environments, thereby overcoming the cost and accessibility issues typically associated with robotic data collection. The device captures grasping trajectories, from which a visual tracking algorithm recovers 6DoF paths.

The paper employs reinforcement learning to train an end-to-end 6DoF closed-loop grasping model. A significant innovation is the use of "action-view" rendering, whereby the system simulates future states based on potential actions. The Q-function evaluates these states to select actions that maximize grasping success.

Key Results and Contributions

The authors report an impressive 92% success rate for grasping in static scenes, and 88% in dynamic scenes, with objects moving within the environment. This marks a crucial advancement over traditional methods constrained to either open-loop execution or limited DoF interaction.

The integration of action-view based rendering allows for more informed decision-making, reducing the exploration space necessary for learning efficient grasping strategies. Training on human demonstration data significantly enhances learning efficiency—evidenced by higher grasp success rates when compared to models trained solely through self-supervised trial and error.

Implications and Future Work

Practically, this research paves the way for more autonomous robotic systems capable of handling complex tasks across diverse settings—critical for applications ranging from industrial automation to service robotics. Theoretically, it enriches the understanding of closed-loop control systems, particularly in contexts necessitating rapid adaptation to dynamic environments.

Looking forward, advancements could include incorporating physical interaction models into the predictive framework to simulate contact mechanics during grasping. Moreover, the generalization potential of action-view rendering to other robotics domains such as navigation warrants exploration.

In conclusion, this paper presents significant advancements in robotic grasping technologies, facilitating deeper integration of AI into automated systems.

PDF Markdown