Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Diffusion-Driven Self-Supervised Learning for Shape Reconstruction and Pose Estimation (2403.12728v1)

Published 19 Mar 2024 in cs.CV

Abstract: Fully-supervised category-level pose estimation aims to determine the 6-DoF poses of unseen instances from known categories, requiring expensive mannual labeling costs. Recently, various self-supervised category-level pose estimation methods have been proposed to reduce the requirement of the annotated datasets. However, most methods rely on synthetic data or 3D CAD model for self-supervised training, and they are typically limited to addressing single-object pose problems without considering multi-objective tasks or shape reconstruction. To overcome these challenges and limitations, we introduce a diffusion-driven self-supervised network for multi-object shape reconstruction and categorical pose estimation, only leveraging the shape priors. Specifically, to capture the SE(3)-equivariant pose features and 3D scale-invariant shape information, we present a Prior-Aware Pyramid 3D Point Transformer in our network. This module adopts a point convolutional layer with radial-kernels for pose-aware learning and a 3D scale-invariant graph convolution layer for object-level shape representation, respectively. Furthermore, we introduce a pretrain-to-refine self-supervised training paradigm to train our network. It enables proposed network to capture the associations between shape priors and observations, addressing the challenge of intra-class shape variations by utilising the diffusion mechanism. Extensive experiments conducted on four public datasets and a self-built dataset demonstrate that our method significantly outperforms state-of-the-art self-supervised category-level baselines and even surpasses some fully-supervised instance-level and category-level methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (77)
  1. G. Wang, F. Manhardt, J. Shao, X. Ji, N. Navab, and F. Tombari, “Self6d: Self-supervised monocular 6d object pose estimation,” in 16th Eur. Conf. Comput. Vis., 2020, pp. 108–125.
  2. G. Wang, F. Manhardt, X. Liu, X. Ji, and F. Tombari, “Occlusion-aware self-supervised monocular 6d object pose estimation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 3, pp. 1788–1803, 2021.
  3. H. Chen, F. Manhardt, N. Navab, and B. Busam, “Texpose: Neural texture learning for self-supervised 6d object pose estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 4841–4852.
  4. Y. Hai, R. Song, J. Li, D. Ferstl, and Y. Hu, “Pseudo flow consistency for self-supervised 6d object pose estimation,” in Proc. IEEE Int. Conf. Comput. Vis., 2023, pp. 14 075–14 085.
  5. Q. Gu, B. Okorn, and D. Held, “Ossid: online self-supervised instance detection by (and for) pose estimation,” IEEE Robot. Autom. Lett., vol. 7, no. 2, pp. 3022–3029, 2022.
  6. G. Zhou, D. Wang, Y. Yan, H. Chen, and Q. Chen, “Semi-supervised 6d object pose estimation without using real annotations,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 8, pp. 5163–5174, 2021.
  7. F. Li, S. R. Vutukur, H. Yu, I. Shugurov, B. Busam, S. Yang, and S. Ilic, “Nerf-pose: A first-reconstruct-then-regress approach for weakly-supervised 6d object pose estimation,” in Proc. IEEE Int. Conf. Comput. Vis., 2023, pp. 2123–2133.
  8. H. Wang, S. Sridhar, J. Huang, J. Valentin, S. Song, and L. J. Guibas, “Normalized object coordinate space for category-level 6d object pose and size estimation,” in Proc. IEEE. Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2642–2651.
  9. K. Chen and Q. Dou, “Sgpa: Structure-guided prior adaptation for category-level 6d object pose estimation,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 2773–2782.
  10. M. Tian, M. H. Ang, and G. H. Lee, “Shape prior deformation for categorical 6d object pose and size estimation,” in 16th Eur. Conf. Comput. Vis., 2020, pp. 530–546.
  11. R. Wang, X. Wang, T. Li, R. Yang, M. Wan, and W. Liu, “Query6dof: Learning sparse queries as implicit shape prior for category-level 6dof pose estimation,” in Proc. IEEE Int. Conf. Comput. Vis., 2023, pp. 14 055–14 064.
  12. T. Lee, B.-U. Lee, I. Shin, J. Choe, U. Shin, I. S. Kweon, and K.-J. Yoon, “Uda-cope: unsupervised domain adaptation for category-level object pose estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 14 891–14 900.
  13. J. Lin, Z. Wei, C. Ding, and K. Jia, “Category-level 6d object pose and size estimation using self-supervised deep prior deformation networks,” in Eur. Conf. Comput. Vis., 2022, pp. 19–34.
  14. W. Peng, J. Yan, H. Wen, and Y. Sun, “Self-supervised category-level 6d object pose estimation with deep implicit shape representation,” in Proc. 36th AAAI Conf. Artif. Intell., 2022, pp. 2082–2090.
  15. X. Liu, J. Zhang, R. Hu, H. Huang, H. Wang, and L. Yi, “Self-supervised category-level articulated object pose estimation with part-level se (3) equivariance,” in Proc. Int. Conf. Learn. Representations, 2023.
  16. X. Li, Y. Weng, L. Yi, L. J. Guibas, A. Abbott, S. Song, and H. Wang, “Leveraging se (3) equivariance for self-supervised category-level object pose estimation from point clouds,” Proc. Adv. Neural Inf. Process. Syst., vol. 34, pp. 15 370–15 381, 2021.
  17. M. Zaccaria, F. Manhardt, Y. Di, F. Tombari, J. Aleotti, and M. Giorgini, “Self-supervised category-level 6d object pose estimation with optical flow consistency,” IEEE Robot. Autom. Lett., vol. 8, no. 5, pp. 2510–2517, 2023.
  18. K. Zhang, Y. Fu, S. Borse, H. Cai, F. Porikli, and X. Wang, “Self-supervised geometric correspondence for category-level 6d object pose estimation in the wild,” arXiv preprint arXiv:2210.07199, 2022.
  19. Y. Ze and X. Wang, “Category-level 6d object pose estimation in the wild: A semi-supervised learning approach and a new dataset,” Proc. Adv. Neural Inf. Process. Syst., vol. 35, pp. 27 469–27 483, 2022.
  20. Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, “Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes,” arXiv preprint arXiv:1711.00199, 2017.
  21. X. Deng, Y. Xiang, A. Mousavian, C. Eppner, T. Bretl, and D. Fox, “Self-supervised 6d object pose estimation for robot manipulation,” in Proc. IEEE Int. Conf. Robot. Automat., 2020, pp. 3665–3671.
  22. J. Sock, G. Garcia-Hernando, A. Armagan, and T.-K. Kim, “Introducing pose consistency and warp-alignment for self-supervised 6d object pose estimation in color images,” in Proc. IEEE Int. Conf. 3D Vis., 2020, pp. 291–300.
  23. Y. Di, F. Manhardt, G. Wang, X. Ji, N. Navab, and F. Tombari, “So-pose: Exploiting self-occlusion for direct 6d pose estimation,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 12 396–12 405.
  24. Y. He, H. Fan, H. Huang, Q. Chen, and J. Sun, “Towards self-supervised category-level object pose and size estimation,” arXiv preprint arXiv:2203.02884, 2022.
  25. F. Manhardt, G. Wang, B. Busam, M. Nickel, S. Meier, L. Minciullo, X. Ji, and N. Navab, “Cps++: Improving class-level 6d pose and shape estimation from monocular images with self-supervised learning,” arXiv preprint arXiv:2003.05848, 2020.
  26. S. Yu, D.-H. Zhai, and Y. Xia, “Robotic grasp detection based on category-level object pose estimation with self-supervised learning,” IEEE/ASME Trans. Mechatronics, 2023.
  27. I. Shugurov, S. Zakharov, and S. Ilic, “Dpodv2: Dense correspondence-based 6 dof pose estimation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 11, pp. 7417–7435, 2021.
  28. M. Zhu, K. G. Derpanis, Y. Yang, S. Brahmbhatt, M. Zhang, C. Phillips, M. Lecce, and K. Daniilidis, “Single image 3d object detection and pose estimation for grasping,” in Proc. IEEE Int. Conf. Robot. Automat., 2014, pp. 3936–3943.
  29. W. Kehl, F. Manhardt, F. Tombari, S. Ilic, and N. Navab, “Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 1521–1529.
  30. A. Tejani, R. Kouskouridas, A. Doumanoglou, D. Tang, and T.-K. Kim, “Latent-class hough forests for 6 dof object pose estimation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 1, pp. 119–132, 2017.
  31. S. Peng, Y. Liu, Q. Huang, X. Zhou, and H. Bao, “Pvnet: Pixel-wise voting network for 6dof pose estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 4561–4570.
  32. J. Guo, X. Xing, W. Quan, D.-M. Yan, Q. Gu, Y. Liu, and X. Zhang, “Efficient center voting for object detection and 6d pose estimation in 3d point cloud,” IEEE Trans. Image Process., vol. 30, pp. 5072–5084, 2021.
  33. G. Zhou, Y. Yan, D. Wang, and Q. Chen, “A novel depth and color feature fusion framework for 6d object pose estimation,” IEEE Trans. Multimedia, vol. 23, pp. 1630–1639, 2020.
  34. W.-L. Huang, C.-Y. Hung, and I.-C. Lin, “Confidence-based 6d object pose estimation,” IEEE Trans. Multimedia, vol. 24, pp. 3025–3035, 2021.
  35. D. Wang, G. Zhou, Y. Yan, H. Chen, and Q. Chen, “Geopose: Dense reconstruction guided 6d object pose estimation with geometric consistency,” IEEE Trans. Multimedia, vol. 24, pp. 4394–4408, 2021.
  36. J. Liu, Z. Cao, Y. Tang, X. Liu, and M. Tan, “Category-level 6d object pose estimation with structure encoder and reasoning attention,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 10, pp. 6728–6740, 2022.
  37. J. Wang, K. Chen, and Q. Dou, “Category-level 6d object pose estimation via cascaded relation and recurrent reconstruction networks,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst., 2021, pp. 4807–4814.
  38. W. Chen, X. Jia, H. J. Chang, J. Duan, L. Shen, and A. Leonardis, “Fs-net: Fast shape-based network for category-level 6d object pose estimation with decoupled rotation mechanism,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 1581–1590.
  39. J. Lin, Z. Wei, Z. Li, S. Xu, K. Jia, and Y. Li, “Dualposenet: Category-level 6d object pose and size estimation using dual pose network with refined learning of pose consistency,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 3560–3569.
  40. X. Liu, G. Wang, Y. Li, and X. Ji, “Catre: Iterative point clouds alignment for category-level object pose refinement,” in 17th Eur. Conf. Comput. Vis., 2022, pp. 499–516.
  41. Y. Di, R. Zhang, Z. Lou, F. Manhardt, X. Ji, N. Navab, and F. Tombari, “Gpv-pose: Category-level object pose estimation via geometry-guided point-wise voting,” in IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 6781–6791.
  42. H. Lin, Z. Liu, C. Cheang, Y. Fu, G. Guo, and X. Xue, “Sar-net: shape alignment and recovery network for category-level 6d object pose and size estimation,” in IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 6707–6717.
  43. D. Chen, J. Li, Z. Wang, and K. Xu, “Learning canonical shape space for category-level 6d object pose and size estimation,” in IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 11 973–11 982.
  44. L. Zou, Z. Huang, N. Gu, and G. Wang, “6d-vit: Category-level 6d object pose estimation via transformer-based instance representation learning,” IEEE Trans. Image Process., vol. 31, pp. 6907–6921, 2022.
  45. M. Z. Irshad, T. Kollar, M. Laskey, K. Stone, and Z. Kira, “Centersnap: Single-shot multi-object 3d shape reconstruction and categorical 6d pose and size estimation,” in Proc. IEEE Int. Conf. Robot. Automat., 2022, pp. 10 632–10 640.
  46. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 652–660.
  47. C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” Proc. Adv. Neural inf. Process. Syst., vol. 30, 2017.
  48. W. Yuan, T. Khot, D. Held, C. Mertz, and M. Hebert, “Pcn: Point completion network,” in Proc. IEEE Int. Conf. 3D Vis., 2018, pp. 728–737.
  49. Y. Chang, C. Jung, and Y. Xu, “Finerpcn: High fidelity point cloud completion network using pointwise convolution,” Neurocomputing, vol. 460, pp. 266–276, 2021.
  50. Y. Nie, Y. Lin, X. Han, S. Guo, J. Chang, S. Cui, J. Zhang et al., “Skeleton-bridged point completion: From global inference to local adjustment,” Proc. Adv. Neural Inf. Process. Syst., vol. 33, pp. 16 119–16 130, 2020.
  51. W. Zhang, C. Long, Q. Yan, A. L. Chow, and C. Xiao, “Multi-stage point completion network with critical set supervision,” Computer Aided Geometric Design, vol. 82, p. 101925, 2020.
  52. K. Zhang, X. Yang, Y. Wu, and C. Jin, “Srpcn: Structure retrieval based point completion network,” arXiv preprint arXiv:2202.02669, 2022.
  53. T. Hu, Z. Han, and M. Zwicker, “3d shape completion with multi-view consistent inference,” in Proc. 34th AAAI Conf. Artif. Intell., vol. 34, no. 07, 2020, pp. 10 997–11 004.
  54. B. Gong, Y. Nie, Y. Lin, X. Han, and Y. Yu, “Me-pcn: Point completion conditioned on mask emptiness,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 12 488–12 497.
  55. G. Qian, A. Abualshour, G. Li, A. Thabet, and B. Ghanem, “Pu-gcn: Point cloud upsampling using graph convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 11 683–11 692.
  56. X. Pan, Z. Xia, S. Song, L. E. Li, and G. Huang, “3d object detection with pointformer,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 7463–7472.
  57. H. Zhao, L. Jiang, J. Jia, P. H. Torr, and V. Koltun, “Point transformer,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 16 259–16 268.
  58. J. Zhang, X. Chen, Z. Cai, L. Pan, H. Zhao, S. Yi, C. K. Yeo, B. Dai, and C. C. Loy, “Unsupervised 3d shape completion through gan inversion,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 1768–1777.
  59. P. Spurek, A. Kasymov, M. Mazur, D. Janik, S. K. Tadeja, J. Tabor, T. Trzciński et al., “Hyperpocket: Generative point cloud completion,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst., 2022, pp. 6848–6853.
  60. P. Mittal, Y.-C. Cheng, M. Singh, and S. Tulsiani, “Autosdf: Shape priors for 3d completion, reconstruction and generation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 306–315.
  61. Y. You, R. Shi, W. Wang, and C. Lu, “Cppf: Towards robust category-level 9d pose estimation in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 6866–6875.
  62. H. Chen, S. Liu, W. Chen, H. Li, and R. Hill, “Equivariant point network for 3d point cloud analysis,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 14 514–14 523.
  63. H. Thomas, C. R. Qi, J.-E. Deschaud, B. Marcotegui, F. Goulette, and L. J. Guibas, “Kpconv: Flexible and deformable convolution for point clouds,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 6411–6420.
  64. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Proc. Adv. Neural Inf. Process. Syst., vol. 30, 2017.
  65. J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Proc. Adv. Neural Inf. Process. Syst., vol. 33, pp. 6840–6851, 2020.
  66. L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” in Proc. IEEE Int. Conf. Comput. Vis., 2023, pp. 3836–3847.
  67. J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” in Proc. Int. Conf. Mach. Learn., 2015, pp. 2256–2265.
  68. M. Z. Irshad, S. Zakharov, R. Ambrus, T. Kollar, Z. Kira, and A. Gaidon, “Shapo: Implicit representations for multi-object shape, appearance, and pose optimization,” in 17th Eur. Conf. Comput. Vis., 2022.
  69. L. Zheng, C. Wang, Y. Sun, E. Dasgupta, H. Chen, A. Leonardis, W. Zhang, and H. J. Chang, “Hs-pose: Hybrid scope feature extraction for category-level object pose estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 17 163–17 173.
  70. J. Liu, Y. Chen, X. Ye, and X. Qi, “Prior-free category-level pose estimation with implicit space transformation,” arXiv preprint arXiv:2303.13479, 2023.
  71. J. Lin, Z. Wei, Y. Zhang, and K. Jia, “Vi-net: Boosting category-level 6d object pose estimation via learning decoupled rotations on the spherical representations,” in Proc. IEEE Int. Conf. Comput. Vis., 2023, pp. 14 001–14 011.
  72. L. Zou, Z. Huang, N. Gu, and G. Wang, “Gpt-cope: A graph-guided point transformer for category-level object pose estimation,” IEEE Trans. Circuits Syst. Video Technol., 2023.
  73. J. Liu, W. Sun, C. Liu, H. Yang, X. Zhang, and A. Mian, “Mh6d: Multi-hypothesis consistency learning for category-level 6-d object pose estimation,” IEEE Trans. Neural Netw. Learn. Syst., 2024.
  74. A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su et al., “Shapenet: An information-rich 3d model repository,” arXiv preprint arXiv:1512.03012, 2015.
  75. Y. Li, G. Wang, X. Ji, Y. Xiang, and D. Fox, “Deepim: Deep iterative matching for 6d pose estimation,” in 15th Eur. Conf. Comput. Vis., 2018, pp. 683–698.
  76. Y. Labbé, J. Carpentier, M. Aubry, and J. Sivic, “Cosypose: Consistent multi-view multi-object 6d pose estimation,” in 16th Eur. Conf. Comput. Vis., 2020, pp. 574–591.
  77. G. Li, Y. Li, Z. Ye, Q. Zhang, T. Kong, Z. Cui, and G. Zhang, “Generative category-level shape and pose estimation with semantic primitives,” in CoRL, 2023, pp. 1390–1400.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jingtao Sun (3 papers)
  2. Yaonan Wang (51 papers)
  3. Mingtao Feng (23 papers)
  4. Chao Ding (45 papers)
  5. Mike Zheng Shou (165 papers)
  6. Ajmal Saeed Mian (8 papers)

Summary

We haven't generated a summary for this paper yet.