MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images (2403.01517v2)
Abstract: Recent learning methods for object pose estimation require resource-intensive training for each individual object instance or category, hampering their scalability in real applications when confronted with previously unseen objects. In this paper, we propose MatchU, a Fuse-Describe-Match strategy for 6D pose estimation from RGB-D images. MatchU is a generic approach that fuses 2D texture and 3D geometric cues for 6D pose prediction of unseen objects. We rely on learning geometric 3D descriptors that are rotation-invariant by design. By encoding pose-agnostic geometry, the learned descriptors naturally generalize to unseen objects and capture symmetries. To tackle ambiguous associations using 3D geometry only, we fuse additional RGB information into our descriptor. This is achieved through a novel attention-based mechanism that fuses cross-modal information, together with a matching loss that leverages the latent space learned from RGB data to guide the descriptor learning process. Extensive experiments reveal the generalizability of both the RGB-D fusion strategy as well as the descriptor efficacy. Benefiting from the novel designs, MatchU surpasses all existing methods by a significant margin in terms of both accuracy and speed, even without the requirement of expensive re-training or rendering.
- Ronald T. Azuma. A Survey of Augmented Reality. Presence: Teleoperators and Virtual Environments, 1997.
- Learning 6d object pose estimation using 3d object coordinates. In ECCV, 2014.
- I like to move it: 6d pose estimation as an action decision process. arXiv preprint arXiv:2009.12678, 2020.
- Ove6d: Object viewpoint encoding for depth-based 6d object pose estimation. In CVPR, 2022.
- End-to-end object detection with transformers. In ECCV, 2020.
- Zeropose: Cad-model-based zero-shot pose estimation, 2023.
- Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In NeurIPS, 2013.
- Gpv-pose: Category-level object pose estimation via geometry-guided point-wise voting. In CVPR, 2022.
- Recovering 6d object pose and predicting next-best-view in the crowd. In CVPR, 2016.
- Google scanned objects: A high-quality dataset of 3d scanned household items. In ICRA, 2022.
- 3d object detection and localization using multimodal point pair features. In International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, 2012.
- Model globally, match locally: Efficient and robust 3d object recognition. In CVPR, 2010a.
- Model globally, match locally: Efficient and robust 3d object recognition. In CVPR, 2010b.
- Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 1981.
- Imagebind: One embedding space to bind them all. In CVPR, 2023.
- Surfemb: Dense and continuous correspondence distributions for object pose estimation with learnt surface embeddings. In CVPR, 2022.
- Spyropose: Se (3) pyramids for object pose distribution estimation. In ICCV, 2023.
- Onepose++: Keypoint-free one-shot object pose estimation without CAD models. In NeurIPS, 2022.
- Pvn3d: A deep point-wise 3d keypoints voting network for 6dof pose estimation. In CVPR, 2020a.
- Pvn3d: A deep point-wise 3d keypoints voting network for 6dof pose estimation. In CVPR, 2020b.
- Ffb6d: A full flow bidirectional fusion network for 6d pose estimation. In CVPR, 2021.
- Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In ICCV, 2011.
- T-LESS: An RGB-D dataset for 6D pose estimation of texture-less objects. WACV, 2017.
- Bop: Benchmark for 6d object pose estimation. In ECCV, 2018.
- Deep learning for 6d pose estimation of objects — a case study for autonomous driving. Expert Systems with Applications, 2023.
- Predator: Registration of 3d point clouds with low overlap. In CVPR, 2021.
- Ki-pode: Keypoint-based implicit pose distribution estimation of rigid objects. arXiv preprint arXiv:2209.09659, 2022.
- Transformers are rnns: Fast autoregressive transformers with linear attention. In ICML, 2020.
- Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again. In ICCV, 2017.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- A hybrid approach for 6dof pose estimation. In ECCV, 2020.
- Cosypose: Consistent multi-view multi-object 6d pose estimation. In ECCV, 2020.
- Megapose: 6d pose estimation of novel objects via render & compare. In CoRL, 2022.
- Nerf-pose: A first-reconstruct-then-regress approach for weakly-supervised 6d object pose estimation. In ICCV, 2023.
- Stereo vision-based semantic 3d object and ego-motion tracking for autonomous driving. In ECCV, 2018.
- Lepard: Learning partial point cloud matching in rigid and deformable scenes. In CVPR, 2022.
- Feature pyramid networks for object detection. In CVPR, 2017.
- Augmented reality for developers: Build practical augmented reality applications with unity, ARCore, ARKit, and Vuforia. Packt Publishing Ltd, 2017.
- Gen6d: Generalizable model-free 6-dof object pose estimation from rgb images. In ECCV, 2022.
- Explaining the ambiguity of object detection and 6d pose from visual data. In ICCV, 2019.
- Matthew T. Mason. Toward robotic manipulation. Annual Review of Control, Robotics, and Autonomous Systems, 2018.
- Implicit-pdf: Non-parametric representation of probability distributions on the rotation manifold. arXiv preprint arXiv:2106.05965, 2021.
- Cnos: A strong baseline for cad-based novel object segmentation. In ICCV, 2023.
- Zephyr: Zero-shot pose hypothesis rating. In ICRA, 2021.
- Geometric transformer for fast and robust point cloud registration. In CVPR.
- Osop: A multi-stage one shot object pose estimation framework. In CVPR, 2022a.
- Osop: A multi-stage one shot object pose estimation framework. In CVPR, 2022b.
- Zebrapose: Coarse to fine surface encoding for 6dof object pose estimation. In CVPR, 2022.
- Loftr: Detector-free local feature matching with transformers. In CVPR, 2021.
- Onepose: One-shot object pose estimation without cad models. In CVPR, 2022.
- Circle loss: A unified perspective of pair similarity optimization. In CVPR, 2020.
- Bop challenge 2022 on detection, segmentation and pose estimation of specific rigid objects. In CVPR, 2023.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 2008.
- Densefusion: 6d object pose estimation by iterative dense fusion. In CVPR, 2019a.
- Gdr-net: Geometry-guided direct regression network for monocular 6d object pose estimation. In CVPR, 2021a.
- Normalized object coordinate space for category-level 6d object pose and size estimation. In CVPR, 2019b.
- You only hypothesize once: Point cloud registration with rotation-equivariant descriptors. arXiv preprint arXiv:2109.00182, 2021b.
- Learning descriptors for object recognition and 3d pose estimation. In CVPR, 2015.
- Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. Robotics: Science and Systems, 2018.
- Pointfusion: Deep sensor fusion for 3d bounding box estimation. In CVPR, 2018.
- Cofinet: Reliable coarse-to-fine correspondences for robust pointcloud registration. In NeurIPS, 2021.
- Riga: Rotation-invariant and globally-aware descriptors for point cloud registration, 2022.
- Rotation-invariant transformer for point cloud matching. In CVPR, 2023a.
- Rotation-invariant transformer for point cloud matching. In CVPR, 2023b.
- Learning hierarchical representation with sparsity for rgb-d object recognition. In IROS, 2012.
- Dpod: 6d pose object detector and refiner. In ICCV, 2019.
- Multi-view self-supervised deep learning for 6d pose estimation in the amazon picking challenge. In ICRA, 2017.
- Learning symmetry-aware geometry correspondences for 6d object pose estimation. In ICCV, 2023.
- Deep fusion transformer network with weighted vector-wise keypoints voting for robust 6d object pose estimation. arXiv preprint arXiv:2308.05438, 2023.