Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal Category-level Pose Estimation (2402.12647v2)

Published 20 Feb 2024 in cs.CV and cs.RO

Abstract: This paper addresses the challenging problem of category-level pose estimation. Current state-of-the-art methods for this task face challenges when dealing with symmetric objects and when attempting to generalize to new environments solely through synthetic data training. In this work, we address these challenges by proposing a probabilistic model that relies on diffusion to estimate dense canonical maps crucial for recovering partial object shapes as well as establishing correspondences essential for pose estimation. Furthermore, we introduce critical components to enhance performance by leveraging the strength of the diffusion models with multi-modal input representations. We demonstrate the effectiveness of our method by testing it on a range of real datasets. Despite being trained solely on our generated synthetic data, our approach achieves state-of-the-art performance and unprecedented generalization qualities, outperforming baselines, even those specifically trained on the target domain.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. T. Hodan, D. Barath, and J. Matas, “Epos: estimating 6d pose of objects with symmetries,” in Conf. on Computer Vision and Pattern Recognition.   IEEE/CVF, 2020, pp. 11 703–11 712.
  2. S. Zakharov, I. Shugurov, and S. Ilic, “Dpod: Dense 6d pose object detector in rgb images,” in Int. Conf. on Computer Vision, 2019.
  3. Y. Su, M. Saleh, T. Fetzer, J. Rambach, N. Navab, B. Busam, D. Stricker, and F. Tombari, “Zebrapose: Coarse to fine surface encoding for 6dof object pose estimation,” in Conf. on Computer Vision and Pattern Recognition.   IEEE/CVF, 2022, pp. 6738–6748.
  4. T. Hodan and A. Melenovsky. (2019) Bop: Benchmark for 6d object pose estimation: https://bop.felk.cvut.cz/home/.
  5. H. Wang, S. Sridhar, J. Huang, J. P. C. Valentin, S. Song, and L. J. Guibas, “Normalized object coordinate space for category-level 6d object pose and size estimation,” Conf. on Computer Vision and Pattern Recognition, pp. 2637–2646, 2019.
  6. H. Ling, J. Gao, A. Kar, W. Chen, and S. Fidler, “Fast interactive object annotation with curve-gcn,” in Conf. on Computer Vision and Pattern Recognition.   IEEE/CVF, 2019, pp. 5257–5266.
  7. J. Lin, Z. Wei, Z. Li, S. Xu, K. Jia, and Y. Li, “Dualposenet: Category-level 6d object pose and size estimation using dual pose network with refined learning of pose consistency,” in Conf. on Computer Vision and Pattern Recognition.   IEEE/CVF, 2021, pp. 3560–3569.
  8. W. Chen, X. Jia, H. J. Chang, J. Duan, L. Shen, and A. Leonardis, “Fs-net: Fast shape-based network for category-level 6d object pose estimation with decoupled rotation mechanism,” in Conf. on Computer Vision and Pattern Recognition.   IEEE/CVF, 2021, pp. 1581–1590.
  9. Y. Di, R. Zhang, Z. Lou, F. Manhardt, X. Ji, N. Navab, and F. Tombari, “Gpv-pose: Category-level object pose estimation via geometry-guided point-wise voting,” in Conf. on Computer Vision and Pattern Recognition.   IEEE/CVF, 2022, pp. 6781–6791.
  10. L. Zheng, C. Wang, Y. Sun, E. Dasgupta, H. Chen, A. Leonardis, W. Zhang, and H. J. Chang, “Hs-pose: Hybrid scope feature extraction for category-level object pose estimation,” in Conf. on Computer Vision and Pattern Recognition.   IEEE/CVF, June 2023, pp. 17 163–17 173.
  11. Y. You, R. Shi, W. Wang, and C. Lu, “Cppf: Towards robust category-level 9d pose estimation in the wild,” in Conf. on Computer Vision and Pattern Recognition.   IEEE/CVF, 2022, pp. 6856–6865.
  12. W. Goodwin, I. Havoutis, and I. Posner, “You only look at one: Category-level object representations for pose estimation from a single example,” Conf. on Robot Learning, 2023.
  13. A. Kendall, M. Grimes, and R. Cipolla, “Posenet: A convolutional network for real-time 6-dof camera relocalization,” in Int. Conf. on Computer Vision.   IEEE, 2015, pp. 2938–2946.
  14. F. Walch, C. Hazirbas, L. Leal-Taixe, T. Sattler, S. Hilsenbeck, and D. Cremers, “Image-based localization using lstms for structured feature correlation,” in Int. Conf. on Computer Vision.   IEEE, 2017, pp. 627–637.
  15. F. Engelmann, K. Rematas, B. Leibe, and V. Ferrari, “From Points to Multi-Object 3D Reconstruction,” in Conf. on Computer Vision and Pattern Recognition.   IEEE/CVF, 2021.
  16. M. Lunayach, S. Zakharov, D. Chen, R. Ambrus, Z. Kira, and M. Z. Irshad, “Fsd: Fast self-supervised single rgb-d to categorical 3d objects,” in Int. Conf. on Robotics and Automation.   IEEE, 2024.
  17. S. Hinterstoisser, S. Holzer, C. Cagniart, S. Ilic, K. Konolige, N. Navab, and V. Lepetit, “Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes,” in Int. Conf. on Computer Vision.   IEEE, 2011, pp. 858–865.
  18. P. Wohlhart and V. Lepetit, “Learning descriptors for object recognition and 3d pose estimation,” in Conf. on Computer Vision and Pattern Recognition.   IEEE/CVF, 2015.
  19. S. Zakharov, W. Kehl, B. Planche, A. Hutter, and S. Ilic, “3d object instance recognition & pose estimation using triplet loss with dynamic margin,” in Int. Conf. on Intelligent Robots and Systems.   IEEE/RSJ, 2017, pp. 552–559.
  20. M. Bui, S. Zakharov, S. Albarqouni, S. Ilic, and N. Navab, “When regression meets manifold learning for object recognition and pose estimation,” in Int. Conf. on Robotics and Automation.   IEEE, 2018, pp. 6140–6146.
  21. K. Park, T. Patten, and M. Vincze, “Pix2pose: Pixel-wise coordinate regression of objects for 6d pose estimation,” in Int. Conf. on Computer Vision.   IEEE, 2019, pp. 7668–7677.
  22. Z. Li, G. Wang, and X. Ji, “Cdpn: Coordinates-based disentangled pose network for real-time rgb-based 6-dof object pose estimation,” in Conf. on Computer Vision and Pattern Recognition.   IEEE/CVF, 2019, pp. 7678–7687.
  23. M. Tian, M. H. Ang, and G. H. Lee, “Shape prior deformation for categorical 6d object pose and size estimation,” in European Conf. on Computer Vision.   Springer, 2020, pp. 530–546.
  24. C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” in Int. Conf. on Machine Learning.   PMLR, 2017, pp. 1321–1330.
  25. A. Kendall and R. Cipolla, “Geometric loss functions for camera pose regression with deep learning,” in Conf. on Computer Vision and Pattern Recognition.   IEEE/CVF, 2017, pp. 5974–5983.
  26. C. Bishop, “Mixture density networks,” Aston University,” WorkingPaper, 1994.
  27. I. Gilitschenski, R. Sahoo, W. Schwarting, A. Amini, S. Karaman, and D. Rus, “Deep orientation uncertainty learning based on a bingham loss,” in Int. Conf. on Learning Representations, 2019.
  28. S. Riedel, Z.-C. Marton, and S. Kriegel, “Multi-view orientation estimation using bingham mixture models,” in Int. Conf. on Automation, Quality and Testing, Robotics.   IEEE, 2016, pp. 1–6.
  29. H. Deng, M. Bui, N. Navab, L. Guibas, S. Ilic, and T. Birdal, “Deep bingham networks: Dealing with uncertainty and ambiguity in pose estimation,” Int. Journal of Computer Vision, vol. 130, no. 7, pp. 1627–1654, 2022.
  30. C. Chi, S. Feng, Y. Du, Z. Xu, E. Cousineau, B. Burchfiel, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” in Robotics: Science and Systems, 2023.
  31. J. Zhang, M. Wu, and H. Dong, “Genpose: Generative category-level object pose estimation via diffusion models,” in Conf. on Neural Information Processing Systems, 2023.
  32. J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in Neural Information Processing Systems, vol. 33, pp. 6840–6851, 2020.
  33. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th Int. Conf., Munich, Germany, October 5-9, 2015, Proceedings, Part III 18.   Springer, 2015, pp. 234–241.
  34. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Conf. on Computer Vision and Pattern Recognition.   IEEE/CVF, 2016, pp. 770–778.
  35. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  36. C. Lu, Y. Zhou, F. Bao, J. Chen, C. Li, and J. Zhu, “Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps,” Advances in Neural Information Processing Systems, vol. 35, pp. 5775–5787, 2022.
  37. M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, et al., “Dinov2: Learning robust visual features without supervision,” arXiv preprint arXiv:2304.07193, 2023.
  38. B. T. Phong, “Illumination for computer generated pictures,” in Seminal graphics: pioneering efforts that shaped the field, 1998, pp. 95–101.
  39. H. Yang, J. Shi, and L. Carlone, “TEASER: Fast and Certifiable Point Cloud Registration,” IEEE Trans. Robotics, 2020.
  40. T. Hodan, F. Michel, E. Brachmann, W. Kehl, A. GlentBuch, D. Kraft, B. Drost, J. Vidal, S. Ihrke, X. Zabulis, et al., “Bop: Benchmark for 6d object pose estimation,” in European Conf. on Computer Vision, 2018, pp. 19–34.
  41. Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, “Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes,” in Robotics: Science and Systems, 2018.
  42. Y. Lin, J. Tremblay, S. Tyree, P. A. Vela, and S. Birchfield, “Multi-view fusion for multi-level robotic scene understanding,” in Int. Conf. on Intelligent Robots and Systems.   IEEE/RSJ, 2021, pp. 6817–6824.
  43. M. Matl, “Pyrender,” https://github.com/mmatl/pyrender, 2019.
  44. M. Z. Irshad, T. Kollar, M. Laskey, K. Stone, and Z. Kira, “Centersnap: Single-shot multi-object 3d shape reconstruction and categorical 6d pose and size estimation,” in Int. Conf. on Robotics and Automation.   IEEE, 2022, pp. 10 632–10 640.
  45. M. Z. Irshad, S. Zakharov, R. Ambrus, T. Kollar, Z. Kira, and A. Gaidon, “Shapo: Implicit representations for multi-object shape appearance and pose optimization,” in European Conf. on Computer Vision, 2022.
  46. X. Chen, Z. Dong, J. Song, A. Geiger, and O. Hilliges, “Category level object pose estimation via neural analysis-by-synthesis,” in European Conf. on Computer Vision, 2020.
  47. G. Gao, M. Lauri, Y. Wang, X. Hu, J. Zhang, and S. Frintrop, “6d object pose regression via supervised learning on point clouds,” in Int. Conf. on Robotics and Automation.   IEEE, 2020, pp. 3643–3649.
Citations (2)

Summary

We haven't generated a summary for this paper yet.