Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses (2403.13683v1)

Published 20 Mar 2024 in cs.CV and cs.RO

Abstract: Determining the relative pose of an object between two images is pivotal to the success of generalizable object pose estimation. Existing approaches typically approximate the continuous pose representation with a large number of discrete pose hypotheses, which incurs a computationally expensive process of scoring each hypothesis at test time. By contrast, we present a Deep Voxel Matching Network (DVMNet) that eliminates the need for pose hypotheses and computes the relative object pose in a single pass. To this end, we map the two input RGB images, reference and query, to their respective voxelized 3D representations. We then pass the resulting voxels through a pose estimation module, where the voxels are aligned and the pose is computed in an end-to-end fashion by solving a least-squares problem. To enhance robustness, we introduce a weighted closest voxel algorithm capable of mitigating the impact of noisy voxels. We conduct extensive experiments on the CO3D, LINEMOD, and Objaverse datasets, demonstrating that our method delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods. Our code is released at: https://github.com/sailor-z/DVMNet/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Pointnetlk: Robust & efficient point cloud registration using pointnet. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7163–7172, 2019.
  2. Ronald T Azuma. A survey of augmented reality. Presence: teleoperators & virtual environments, 6(4):355–385, 1997.
  3. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  4. Method for registration of 3-d shapes. In Sensor fusion IV: control paradigms and data structures, pages 586–606. Spie, 1992.
  5. Learning 6d object pose estimation using 3d object coordinates. In Proceedings of the European Conference on Computer Vision, pages 536–551. Springer, 2014.
  6. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
  7. Learning canonical shape space for category-level 6d object pose and size estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11973–11982, 2020.
  8. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5828–5839, 2017.
  9. Objaverse: A universe of annotated 3d objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13142–13153, 2023.
  10. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  11. Are we ready for autonomous driving? the kitti vision benchmark suite. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 3354–3361. IEEE, 2012.
  12. Zero-shot category-level object pose estimation. In Proceedings of the European Conference on Computer Vision, pages 516–532. Springer, 2022.
  13. Multiple view geometry in computer vision. Cambridge university press, 2003.
  14. Onepose++: Keypoint-free one-shot object pose estimation without cad models. Advances in Neural Information Processing Systems, 35:35103–35115, 2022.
  15. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In Asian Conference on Computer Vision, pages 548–562. Springer, 2012.
  16. Single-stage 6d object pose estimation. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2930–2939, 2020.
  17. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  18. Relpose++: Recovering 6d poses from sparse-view observations. arXiv preprint arXiv:2305.04926, 2023.
  19. Category-level 6d object pose and size estimation using self-supervised deep prior deformation networks. In Proceedings of the European Conference on Computer Vision, pages 19–34. Springer, 2022.
  20. Zero-1-to-3: Zero-shot one image to 3d object. arXiv preprint arXiv:2303.11328, 2023.
  21. Gen6d: Generalizable model-free 6-dof object pose estimation from rgb images. Proceedings of the European Conference on Computer Vision, 2022.
  22. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  23. Pose estimation for augmented reality: a hands-on survey. IEEE Transactions on Visualization and Computer Graphics, 22(12):2633–2651, 2015.
  24. Pvnet: Pixel-wise voting network for 6dof pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4561–4570, 2019.
  25. Common objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10901–10911, 2021.
  26. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4938–4947, 2020.
  27. Structure-from-motion revisited. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4104–4113, 2016.
  28. Generalized-icp. In Robotics: science and systems, page 435. Seattle, WA, 2009.
  29. Osop: A multi-stage one shot object pose estimation framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6835–6844, 2022.
  30. Zebrapose: Coarse to fine surface encoding for 6dof object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6738–6748, 2022.
  31. Loftr: Detector-free local feature matching with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8922–8931, 2021.
  32. Onepose: One-shot object pose estimation without cad models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6825–6834, 2022.
  33. Deep object pose estimation for semantic robotic grasping of household objects. In Conference on Robot Learning, 2018.
  34. Densefusion: 6d object pose estimation by iterative dense fusion. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3343–3352, 2019a.
  35. Gdr-net: Geometry-guided direct regression network for monocular 6d object pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 16611–16621, 2021.
  36. Normalized object coordinate space for category-level 6d object pose and size estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2642–2651, 2019b.
  37. Posediffusion: Solving pose estimation via diffusion-aided bundle adjustment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9773–9783, 2023a.
  38. Deep closest point: Learning representations for point cloud registration. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3523–3532, 2019.
  39. Detecting everything in the open world: Towards universal object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11433–11443, 2023b.
  40. Deepsfm: Structure from motion via deep bundle adjustment. In Proceedings of the European Conference on Computer Vision, pages 230–247. Springer, 2020.
  41. Croco v2: Improved cross-view completion pre-training for stereo matching and optical flow. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17969–17980, 2023.
  42. Learning descriptors for object recognition and 3d pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3109–3118, 2015.
  43. Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199, 2017.
  44. Pose from shape: Deep pose estimation for arbitrary 3D objects. In British Machine Vision Conference (BMVC), 2019.
  45. Learning to find good correspondences. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2666–2674, 2018.
  46. Learning two-view correspondences and geometry using order-aware network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5845–5854, 2019.
  47. Relpose: Predicting probabilistic relative rotation for single objects in the wild. In Proceedings of the European Conference on Computer Vision, pages 592–611. Springer, 2022.
  48. Nm-net: Mining reliable neighbors for robust feature correspondences. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 215–224, 2019.
  49. Progressive correspondence pruning by consensus learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6464–6473, 2021.
  50. Locposenet: Robust location prior for unseen object pose estimation. arXiv preprint arXiv:2211.16290v2, 2022a.
  51. Fusing local similarities for retrieval-based 3d orientation estimation of unseen objects. In Proceedings of the European Conference on Computer Vision, pages 106–122. Springer, 2022b.
  52. 3d-aware hypothesis & verification for generalizable relative object pose estimation. arXiv preprint arXiv:2310.03534, 2023.
  53. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5745–5753, 2019.
Citations (4)

Summary

We haven't generated a summary for this paper yet.