Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching (1710.01330v5)

Published 3 Oct 2017 in cs.RO and cs.CV

Abstract: This paper presents a robotic pick-and-place system that is capable of grasping and recognizing both known and novel objects in cluttered environments. The key new feature of the system is that it handles a wide range of object categories without needing any task-specific training data for novel objects. To achieve this, it first uses a category-agnostic affordance prediction algorithm to select and execute among four different grasping primitive behaviors. It then recognizes picked objects with a cross-domain image classification framework that matches observed images to product images. Since product images are readily available for a wide range of objects (e.g., from the web), the system works out-of-the-box for novel objects without requiring any additional training data. Exhaustive experimental results demonstrate that our multi-affordance grasping achieves high success rates for a wide variety of objects in clutter, and our recognition algorithm achieves high accuracy for both known and novel grasped objects. The approach was part of the MIT-Princeton Team system that took 1st place in the stowing task at the 2017 Amazon Robotics Challenge. All code, datasets, and pre-trained models are available online at http://arc.cs.princeton.edu

Authors (21)

Andy Zeng (54 papers)
Shuran Song (110 papers)
Kuan-Ting Yu (8 papers)
Elliott Donlon (6 papers)
Francois R. Hogan (8 papers)
Maria Bauza (24 papers)
Daolin Ma (7 papers)
Orion Taylor (5 papers)
Melody Liu (2 papers)
Eudald Romo (2 papers)
Nima Fazeli (38 papers)
Ferran Alet (14 papers)
Nikhil Chavan Dafle (1 paper)
Rachel Holladay (6 papers)
Isabella Morona (1 paper)
Prem Qu Nair (1 paper)
Druck Green (1 paper)
Ian Taylor (20 papers)
Weber Liu (1 paper)
Thomas Funkhouser (66 papers)

Citations (592)

View on Semantic Scholar

Summary

The paper introduces a system that integrates multi-affordance grasping with cross-domain image matching, eliminating the need for retraining on novel objects.
It employs fully convolutional networks to compute dense affordance maps that enable optimal selection between suction and parallel-jaw grasps in real time.
Experimental results, including success at the 2017 Amazon Robotics Challenge, demonstrate high grasp success rates and exceptional recognition accuracy in cluttered environments.

Robotic Pick-and-Place of Novel Objects in Clutter

The paper by Zeng et al. presents a sophisticated robotic pick-and-place system that adeptly handles both known and novel objects within cluttered environments. Noteworthy for its ability to operate out-of-the-box without necessitating retraining for novel objects, this system merges multi-affordance grasping with an advanced cross-domain image matching framework.

System Overview

Two primary components define the system: a multi-affordance grasping framework and a cross-domain image classification strategy.

Grasping Component: Utilizes fully convolutional networks (FCNs) to compute dense pixel-wise probability maps of affordances associated with four distinct grasping primitives. This enables the robotic arm to infer the most suitable grasping technique—either suction or parallel-jaw—based on real-time visual data. The system's effectiveness in scenarios with heavy clutter underscores its robustness.
Recognition Component: Leverages cross-domain image matching to accurately recognize grasped objects by comparing their observed images to pre-existing product images. This technique circumvents the need for new data collection, facilitating seamless integration of novel objects into the operational workflow.

Experimental Results

Comprehensive experiments demonstrate the system's efficacy:

Grasp Success: High success rates for a diverse array of objects in clutter, displaying significant proficiency in selecting the appropriate grasping strategy.
Recognition Accuracy: Exceptional accuracy in identifying both known and novel objects, supported by a dual-stream convolutional network that aligns observed images with product images.

The system's successful deployment during the 2017 Amazon Robotics Challenge further validates its capacity for real-world applications, achieving the highest performance in the stowing task.

Theoretical and Practical Implications

From a theoretical standpoint, this work enriches the field of robotic perception and manipulation, particularly in environments with substantial complexity. The system's ability to handle novel objects without retraining represents a significant advancement, suggesting a scalable approach for future applications in dynamic environments.

Practically, the implications of this research extend to various sectors, including warehouse automation and service robotics, where efficiency and adaptability in object handling are crucial. By obviating task-specific training data, this system paves the way for broader applications and more agile robotic solutions.

Future Directions

The future development of AI systems in robotics could explore several trajectories:

Enhancement of Feedback Mechanisms: Incorporating closed-loop grasping techniques to reduce error rates further and improve stability during object manipulation.
Reinforcement Learning: Investigating reinforcement learning strategies to evolve more complex picking sequences that address preparatory and indirect actions, such as object rearrangement.
Integration of Advanced Sensors: Adoption of tactile sensors to provide more intricate feedback could refine the grasping process, offering enhanced object handling capabilities.

This paper directs attention to refining robotic capabilities in real-world environments, emphasizing adaptability and scalability. It opens avenues for addressing complex object handling tasks, presenting a robust foundation for future innovations in robotic manipulation and recognition systems.

PDF Markdown

Related Papers

YouTube

Show All Videos