Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images (2204.10776v2)

Published 22 Apr 2022 in cs.CV

Abstract: In this paper, we present a generalizable model-free 6-DoF object pose estimator called Gen6D. Existing generalizable pose estimators either need high-quality object models or require additional depth maps or object masks in test time, which significantly limits their application scope. In contrast, our pose estimator only requires some posed images of the unseen object and is able to accurately predict the poses of the object in arbitrary environments. Gen6D consists of an object detector, a viewpoint selector and a pose refiner, all of which do not require the 3D object model and can generalize to unseen objects. Experiments show that Gen6D achieves state-of-the-art results on two model-free datasets: the MOPED dataset and a new GenMOP dataset collected by us. In addition, on the LINEMOD dataset, Gen6D achieves competitive results compared with instance-specific pose estimators. Project page: https://liuyuan-pal.github.io/Gen6D/.

Citations (91)

View on Semantic Scholar

Summary

The paper introduces Gen6D, achieving accurate 6-DoF pose estimation from RGB images without relying on pre-built object models or additional sensory data.
Its three-component approach integrates an object detector, viewpoint selector, and volume-based pose refiner to enhance accuracy and generalizability.
Experimental results on MOPED and LINEMOD datasets demonstrate state-of-the-art performance and robust handling of unseen objects.

Overview of Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images

The paper introduces Gen6D, an innovative pose estimation methodology designed to accurately determine the 6-DoF (six degrees of freedom) pose of objects from RGB images, circumventing the need for object models, depth maps, or object masks. The paper targets applications in 3D vision without strict preconditions, expanding the usability across diverse environments and novel objects. This work contrasts with existing approaches that require high-quality 3D models and additional data during testing, overcoming significant practical limitations.

Methodology

Gen6D is composed of three core components:

Object Detector: This module identifies the object region within the query image, estimating the object's translation vector. The detector operates using correlation-based techniques to focus on the object while excluding noise from backgrounds.
Viewpoint Selector: A specialized viewpoint selector identifies the reference image with the most similar viewpoint to the query, prompting an initial pose estimate. The selector employs image-pixel comparisons to improve accuracy amid cluttered backgrounds and lacks a precise reference image.
Pose Refiner: A novel volume-based pose refinement process iteratively enhances initial estimates via 3D feature volumes constructed from reference images. This contrasts with traditional methods that rely on object rendering, thus surpassing model dependence.

Experimental Results

Gen6D's efficacy was assessed using model-free datasets, including the MOPED and LINEMOD, and a newly introduced GenMOP dataset. The results indicated Gen6D achieved state-of-the-art performance on the MOPED dataset and demonstrated competitive results compared to instance-specific methods on LINEMOD. The model's robustness across unseen objects further accentuates its practical potential.

Implications

Practical Implications: Gen6D affords significant utility in fields such as augmented reality, robotics, and virtual reality by providing flexible, model-free pose estimation suitable for real-world applications. It promises improvement in scenarios lacking detailed object models or additional sensory data like depth.

Theoretical Implications: The paper contributes to the broader understanding of 6-DoF pose estimation, particularly emphasizing generalizability. Gen6D’s approach demonstrates the feasibility of using RGB images alone, steering future research towards more generalized solutions.

Future Directions

Future developments may focus on improving Gen6D’s robustness against occlusion and further reducing its reliance on even distribution of reference images. Beyond technical enhancement, advancements may address integrations within broader AI systems for increased applicability. Further research could explore adaptive techniques that optimize accuracy with fewer reference images while maintaining low computational demand.

This paper's contribution lies in its demonstration that model-free, accurate pose estimation is feasible with basic RGB data. Such advancements portend significant shifts in how AI systems process visual input for interaction with the three-dimensional world.

PDF Markdown