Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation (1901.02970v2)

Published 9 Jan 2019 in cs.CV

Abstract: The goal of this paper is to estimate the 6D pose and dimensions of unseen object instances in an RGB-D image. Contrary to "instance-level" 6D pose estimation tasks, our problem assumes that no exact object CAD models are available during either training or testing time. To handle different and unseen object instances in a given category, we introduce a Normalized Object Coordinate Space (NOCS)---a shared canonical representation for all possible object instances within a category. Our region-based neural network is then trained to directly infer the correspondence from observed pixels to this shared object representation (NOCS) along with other object information such as class label and instance mask. These predictions can be combined with the depth map to jointly estimate the metric 6D pose and dimensions of multiple objects in a cluttered scene. To train our network, we present a new context-aware technique to generate large amounts of fully annotated mixed reality data. To further improve our model and evaluate its performance on real data, we also provide a fully annotated real-world dataset with large environment and instance variation. Extensive experiments demonstrate that the proposed method is able to robustly estimate the pose and size of unseen object instances in real environments while also achieving state-of-the-art performance on standard 6D pose estimation benchmarks.

PDF Abstract

Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation

Overview

The paper introduces a novel approach for category-level 6D object pose and size estimation from RGB-D images, addressing the challenges of unseen object instances without relying on exact CAD models. The proposed framework, based on the concept of Normalized Object Coordinate Space (NOCS), enables a unified and consistent representation of 6D pose across various object instances within a category. This method is particularly significant for applications requiring interaction with previously unseen objects, such as in robotics and augmented reality.

Methodology

NOCS Representation:

The core innovation, NOCS, provides a canonical frame within a unit cube for each object category, ensuring consistent orientation and scale across instances. This uniform space facilitates direct correspondence between observed pixels and the object’s 3D coordinates, allowing for robust 6D pose estimation.

Neural Network Design:

The authors augment a Mask R-CNN framework to predict not only instance masks and class labels but also NOCS maps. This network configuration enables the extraction of pixel-level NOCS coordinates, which are then combined with depth information to determine precise metric dimensions.

Data Generation and Training:

A major contribution is the introduction of the Context-Aware MixEd ReAlity (CAMERA) method, a data generation technique that composites synthetic objects with real backgrounds in a context-sensitive manner. This approach significantly enriches the training data, allowing the model to handle real-world variations effectively. Complementary to this are comprehensive real-world datasets provided for training and evaluation.

Results and Implications

The experimental results demonstrate that the proposed method achieves state-of-the-art performance on standard 6D pose benchmarks and handles real-world scenes effectively. Importantly, the method achieves a mean average precision (mAP) of 83.9% for 3D object detection on synthetic datasets and maintains competitive performance even when tested on real-world data (mAP of 76.4% for 3D IoU at 50%).

The implications of this research are manifold:

Theoretical Impact: By introducing NOCS, the paper addresses the long-standing challenge of category-level pose estimation without dependency on specific CAD models, bridging a gap between instance-level and category-level approaches.
Practical Applications: This method is especially beneficial for robotic manipulation and AR systems, where interaction with novel objects is common. The ability to estimate pose and size for unobserved instances enhances the adaptability of such systems.

Future Directions

Several avenues for future research are suggested, including improving performance through end-to-end learning directly from RGB data, which might obviate the need for depth input. Further refinement in handling object symmetries and expanding the real-world dataset are likely to improve pose estimation accuracy.

In summary, the paper provides a significant step forward in 6D pose and size estimation, offering both practical solutions for real-world applications and a foundation for future advancements in object detection and manipulation technologies.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

He Wang (294 papers)
Srinath Sridhar (54 papers)
Jingwei Huang (37 papers)
Julien Valentin (29 papers)
Shuran Song (110 papers)
Leonidas J. Guibas (75 papers)

Citations (617)

View on Semantic Scholar

Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation (1901.02970v2)