Novel Object 6D Pose Estimation with a Single Reference View (2503.05578v1)

Published 7 Mar 2025 in cs.CV and cs.RO

Abstract: Existing novel object 6D pose estimation methods typically rely on CAD models or dense reference views, which are both difficult to acquire. Using only a single reference view is more scalable, but challenging due to large pose discrepancies and limited geometric and spatial information. To address these issues, we propose a Single-Reference-based novel object 6D (SinRef-6D) pose estimation method. Our key idea is to iteratively establish point-wise alignment in the camera coordinate system based on state space models (SSMs). Specifically, iterative camera-space point-wise alignment can effectively handle large pose discrepancies, while our proposed RGB and Points SSMs can capture long-range dependencies and spatial information from a single view, offering linear complexity and superior spatial modeling capability. Once pre-trained on synthetic data, SinRef-6D can estimate the 6D pose of a novel object using only a single reference view, without requiring retraining or a CAD model. Extensive experiments on six popular datasets and real-world robotic scenes demonstrate that we achieve on-par performance with CAD-based and dense reference view-based methods, despite operating in the more challenging single reference setting. Code will be released at https://github.com/CNJianLiu/SinRef-6D.

Summary

Overview of Single Reference View-based 6D Pose Estimation

The paper "Novel Object 6D Pose Estimation with a Single Reference View" introduces SinRef-6D, a methodology for estimating the 6D pose of novel objects using only a single reference view. The approach addresses the scalability issue inherent in existing methods which rely heavily on CAD models or dense reference views for pose estimation. These models are often impractical due to the specialized equipment needed or the extensive manual effort involved in creating them. SinRef-6D, on the other hand, uses an iterative point-wise alignment in the camera coordinate system to solve these challenges effectively.

Core Methodology

SinRef-6D operates on the principle of establishing point-wise alignment through State Space Models (SSMs), which handle large pose discrepancies and constrained geometric information from sparse views. The workflow includes:

Initialization: Segmentation of the novel object using RGB-D input to separate it from the background. This process employs Mask R-CNN or CNOS for segmentation. The segmented depth map is then back-projected into a point cloud format.
Points Focalization: The reference and query point clouds are transformed into the camera coordinate system, facilitating accurate iterative alignment. Initial poses are fine-tuned iteratively, enhancing spatial alignment and reducing discrepancies.
Feature Extraction with SSMs: Designed SSMs, including Points SSM and RGB SSM, extract features from point clouds and RGB images. They efficiently encode spatial data with linear complexity, preserving crucial geometric information necessary for accurate alignment.
Iterative Point-wise Alignment and Pose Solving: Using GeoTransformer and WSVD algorithm, point-wise alignment is iteratively refined to enhance geometric consistency across the views. The final pose is solved via weighted singular value decomposition.

Numerical Performance and Comparisons

SinRef-6D was rigorously evaluated across multiple datasets including LineMod, LM-O, TUD-L, IC-BIN, HB, and YCB-V, providing robust performance highlighted by superior alignment accuracy. It achieves competitive results compared to CAD model-dependent techniques, even outperforming some methods in the challenging single reference setting. The methodology demonstrated comparable accuracy with instance-based strategies and performed effectively regardless of object complexity and scene clutter.

Practical and Theoretical Implications

The scalability and efficiency offered by SinRef-6D significantly improve practical applications in augmented reality and robotic manipulation. The CAD-free and retraining-free framework allows for quick adaptation to real-world scenarios with minimal manual input, paving the way for large-scale deployments without substantial infrastructure overheads.

Future Developments

Future work may revolve around enhancing the robustness of SinRef-6D, particularly in handling reflective surfaces and objects with challenging geometries like transparent materials. Advancing SSMs to capture more intricate spatial dependencies and integrating advanced segmentation techniques could further bolster the model's accuracy across diverse use cases. This research not only advances the field of object pose estimation but also underlines the potential future trajectory of AI systems in dynamically understanding and interacting with their environment efficiently.

Related Papers

GitHub

GitHub - CNJianLiu/SinRef-6D: Code for "Novel Object 6D Pose Estimation with a Single Reference View" (5 stars)

Tweets

https://twitter.com/ducha_aiki/status/1899054127121863092