Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation (2401.00029v3)

Published 29 Dec 2023 in cs.CV

Abstract: Estimating the 6D object pose from a single RGB image often involves noise and indeterminacy due to challenges such as occlusions and cluttered backgrounds. Meanwhile, diffusion models have shown appealing performance in generating high-quality images from random noise with high indeterminacy through step-by-step denoising. Inspired by their denoising capability, we propose a novel diffusion-based framework (6D-Diff) to handle the noise and indeterminacy in object pose estimation for better performance. In our framework, to establish accurate 2D-3D correspondence, we formulate 2D keypoints detection as a reverse diffusion (denoising) process. To facilitate such a denoising process, we design a Mixture-of-Cauchy-based forward diffusion process and condition the reverse process on the object features. Extensive experiments on the LM-O and YCB-V datasets demonstrate the effectiveness of our framework.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Progressive-x: Efficient, anytime, multi-model fitting algorithm. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3780–3788, 2019.
  2. Uncertainty-driven 6d pose estimation of objects and scenes from a single rgb image. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3364–3372, 2016.
  3. A stereo vision approach for cooperative robotic movement therapy. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 127–135, 2015.
  4. Crt-6d: Fast 6d object pose estimation with cascaded refinement transformers. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5746–5755, 2023.
  5. End-to-end learnable geometric vision by backpropagating pnp optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8100–8109, 2020.
  6. Blenderproc. arXiv preprint arXiv:1911.01911, 2019.
  7. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
  8. So-pose: Exploiting self-occlusion for direct 6d pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12396–12405, 2021.
  9. Complete solution classification for the perspective-three-point problem. IEEE transactions on pattern analysis and machine intelligence, 25(8):930–943, 2003.
  10. Diffpose: Toward more reliable 3d pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13041–13051, 2023.
  11. Stochastic trajectory prediction via motion indeterminacy diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17113–17122, 2022.
  12. Knowledge distillation for 6d pose estimation by aligning distributions of local predictions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18633–18642, 2023.
  13. Shape-constraint recurrent flow for 6d object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4831–4840, 2023a.
  14. Rigidity-aware detection for 6d object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8927–8936, 2023b.
  15. Surfemb: Dense and continuous correspondence distributions for object pose estimation with learnt surface embeddings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6749–6758, 2022.
  16. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In Computer Vision–ACCV 2012: 11th Asian Conference on Computer Vision, Daejeon, Korea, November 5-9, 2012, Revised Selected Papers, Part I 11, pages 548–562. Springer, 2013.
  17. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, pages 6840–6851. Curran Associates, Inc., 2020.
  18. Epos: Estimating 6d pose of objects with symmetries. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11703–11712, 2020.
  19. Confronting ambiguity in 6d object pose estimation via score-based diffusion on se (3). arXiv preprint arXiv:2305.15873, 2023.
  20. Segmentation-driven 6d object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3385–3394, 2019.
  21. Single-stage 6d object pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2930–2939, 2020.
  22. Repose: Fast 6d object pose refinement via deep texture rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3303–3312, 2021a.
  23. Repose: Fast 6d object pose refinement via deep texture rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3303–3312, 2021b.
  24. Quantile-based estimation of the finite cauchy mixture model. Symmetry, 11(9), 2019.
  25. Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again. In Proceedings of the IEEE international conference on computer vision, pages 1521–1529, 2017.
  26. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  27. Cosypose: Consistent multi-view multi-object 6d pose estimation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVII 16, pages 574–591. Springer, 2020.
  28. Bias field correction in MRI with hampel noise denoising diffusion probabilistic model. In Medical Imaging with Deep Learning, short paper track, 2023.
  29. Ep n p: An accurate o (n) solution to the p n p problem. International journal of computer vision, 81:155–166, 2009.
  30. Dcl-net: Deep correspondence learning network for 6d pose estimation. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IX, pages 369–385. Springer, 2022.
  31. Deepim: Deep iterative matching for 6d pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 683–698, 2018.
  32. Cdpn: Coordinates-based disentangled pose network for real-time rgb-based 6-dof object pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7678–7687, 2019.
  33. Checkerpose: Progressive dense keypoint localization for object pose estimation with graph neural network. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14022–14033, 2023.
  34. Gdrnpp. https://github.com/shanice-l/gdrnpp_bop2022, 2022.
  35. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461–11471, 2022.
  36. Deep model-based 6d pose refinement in rgb. In The European Conference on Computer Vision (ECCV), 2018.
  37. Pose estimation for augmented reality: a hands-on survey. IEEE transactions on visualization and computer graphics, 22(12):2633–2651, 2015.
  38. Spatial feature mapping for 6dof object pose estimation. Pattern Recognition, 131:108835, 2022.
  39. Sdedit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations, 2021.
  40. Making deep heatmaps robust to partial occlusions for 3d object pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 119–134, 2018.
  41. Pix2pose: Pixel-wise coordinate regression of objects for 6d pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7668–7677, 2019.
  42. Pvnet: Pixel-wise voting network for 6dof pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4561–4570, 2019.
  43. Robot guidance using machine vision techniques in industrial environments: A comparative review. Sensors, 16(3):335, 2016.
  44. Bb8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In Proceedings of the IEEE international conference on computer vision, pages 3828–3836, 2017.
  45. 6dof object tracking based on 3d scans for augmented reality remote live support. Comput., 7:6, 2018.
  46. Robust 6-dof pose estimation under hybrid constraints. Sensors, 22(22):8758, 2022.
  47. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
  48. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265. PMLR, 2015.
  49. Hybridpose: 6d object pose estimation under hybrid representations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 431–440, 2020.
  50. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021.
  51. Zebrapose: Coarse to fine surface encoding for 6dof object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6738–6748, 2022.
  52. Implicit 3d orientation learning for 6d object detection from rgb images. In European Conference on Computer Vision, 2018.
  53. Mahdi Teimouri. Statistical inference for mixture of cauchy distributions. arXiv preprint arXiv:1809.05722, 2018.
  54. Real-time seamless single shot 6d object pose prediction. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 292–301, 2018.
  55. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9627–9636, 2019.
  56. Se(3)-diffusionfields: Learning smooth cost functions for joint grasp and motion optimization through diffusion. IEEE International Conference on Robotics and Automation (ICRA), 2023.
  57. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  58. Occlusion-aware self-supervised monocular 6d object pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021a.
  59. Gdr-net: Geometry-guided direct regression network for monocular 6d object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16611–16621, 2021b.
  60. 6d-vnet: End-to-end 6-dof vehicle pose estimation from monocular rgb images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019.
  61. Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. 2018.
  62. Rnnpose: Recurrent 6-dof object pose refinement with robust correspondence field estimation and pose optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  63. Object pose estimation with statistical guarantees: Conformal keypoint detection and geometric uncertainty propagation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8947–8958, 2023.
  64. Dpod: 6d pose object detector and refiner. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 1941–1950, 2019.
Citations (4)

Summary

We haven't generated a summary for this paper yet.