Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation (2403.19527v1)

Published 28 Mar 2024 in cs.CV

Abstract: Category-level 6D object pose estimation aims to estimate the rotation, translation and size of unseen instances within specific categories. In this area, dense correspondence-based methods have achieved leading performance. However, they do not explicitly consider the local and global geometric information of different instances, resulting in poor generalization ability to unseen instances with significant shape variations. To deal with this problem, we propose a novel Instance-Adaptive and Geometric-Aware Keypoint Learning method for category-level 6D object pose estimation (AG-Pose), which includes two key designs: (1) The first design is an Instance-Adaptive Keypoint Detection module, which can adaptively detect a set of sparse keypoints for various instances to represent their geometric structures. (2) The second design is a Geometric-Aware Feature Aggregation module, which can efficiently integrate the local and global geometric information into keypoint features. These two modules can work together to establish robust keypoint-level correspondences for unseen instances, thus enhancing the generalization ability of the model.Experimental results on CAMERA25 and REAL275 datasets show that the proposed AG-Pose outperforms state-of-the-art methods by a large margin without category-specific shape priors.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Ronald T Azuma. A survey of augmented reality. Presence: teleoperators & virtual environments, 6(4):355–385, 1997.
  2. Sgpa: Structure-guided prior adaptation for category-level 6d object pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2773–2782, 2021.
  3. Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1907–1915, 2017.
  4. So-pose: Exploiting self-occlusion for direct 6d pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 12396–12405, 2021.
  5. Gpv-pose: Category-level object pose estimation via geometry-guided point-wise voting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6781–6791, 2022.
  6. Mv6d: Multi-view 6d pose estimation on rgb-d frames using a deep point-wise voting network. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3568–3575, 2022.
  7. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition, pages 3354–3361. IEEE, 2012.
  8. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  9. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
  10. Pvn3d: A deep point-wise 3d keypoints voting network for 6dof pose estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  11. Ffb6d: A full flow bidirectional fusion network for 6d pose estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  12. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  13. Uda-cope: unsupervised domain adaptation for category-level object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14891–14900, 2022.
  14. Tta-cope: Test-time adaptation for category-level object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21285–21295, 2023.
  15. Sar-net: Shape alignment and recovery network for category-level 6d object pose and size estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6707–6717, 2022a.
  16. Dualposenet: Category-level 6d object pose and size estimation using dual pose network with refined learning of pose consistency. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3560–3569, 2021.
  17. Category-level 6d object pose and size estimation using self-supervised deep prior deformation networks. In European Conference on Computer Vision, pages 19–34. Springer, 2022b.
  18. Prior-free category-level pose estimation with implicit space transformation. arXiv preprint arXiv:2303.13479, 2023.
  19. Gdrnpp. https://github.com/shanice-l/gdrnpp_bop2022, 2022.
  20. Roi-10d: Monocular lifting of 2d detection to 6d pose and metric shape. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2069–2078, 2019.
  21. Pose estimation for augmented reality: A hands-on survey. IEEE Transactions on Visualization and Computer Graphics (TVCG), 22(12):2633–2651, 2015.
  22. Es6d: A computation efficient and symmetry-aware 6d pose regression framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6718–6727, 2022.
  23. 6-dof graspnet: Variational grasp generation for object manipulation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2901–2910, 2019.
  24. Pix2pose: Pix2pose: Pixel-wise coordinate regression of objects for 6d pose estimation. In The IEEE International Conference on Computer Vision (ICCV), 2019.
  25. Pvnet: Pixel-wise voting network for 6dof pose estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  26. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems (NeurIPS), pages 5099–5108, 2017.
  27. Leslie N Smith. Cyclical learning rates for training neural networks. In 2017 IEEE winter conference on applications of computer vision (WACV), pages 464–472. IEEE, 2017.
  28. Shape prior deformation for categorical 6d object pose and size estimation. In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
  29. Shinji Umeyama. Least-squares estimation of transformation parameters between two point patterns. IEEE Transactions on Pattern Analysis & Machine Intelligence, 13(04):376–380, 1991.
  30. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  31. Densefusion: 6d object pose estimation by iterative dense fusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3343–3352, 2019a.
  32. GDR-Net: Geometry-guided direct regression network for monocular 6d object pose estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16611–16621, 2021a.
  33. Normalized object coordinate space for category-level 6d object pose and size estimation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019b.
  34. Category-level 6d object pose estimation via cascaded relation and recurrent reconstruction networks. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4807–4814. IEEE, 2021b.
  35. Query6dof: Learning sparse queries as implicit shape prior for category-level 6dof pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14055–14064, 2023.
  36. You only demonstrate once: Category-level manipulation from single visual demonstration. Robotics: Science and Systems (RSS), 2022.
  37. Grasp proposal networks: An end-to-end solution for visual learning of robotic grasps. Advances in Neural Information Processing Systems, 33:13174–13184, 2020.
  38. Keypoint cascade voting for point cloud based 6dof pose estimation. In 2022 International Conference on 3D Vision (3DV). IEEE, 2022a.
  39. Vote from the center: 6 dof pose estimation in rgb-d images by radial keypoint voting. In European Conference on Computer Vision (ECCV). Springer, 2022b.
  40. Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199, 2017.
  41. Rnnpose: Recurrent 6-dof object pose refinement with robust correspondence field estimation and pose optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  42. Rbp-pose: Residual bounding box projection for category-level pose estimation. In European Conference on Computer Vision, pages 655–672. Springer, 2022.
  43. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2881–2890, 2017.
  44. Hs-pose: Hybrid scope feature extraction for category-level object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17163–17173, 2023.
  45. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5745–5753, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Xiao Lin (181 papers)
  2. Wenfei Yang (19 papers)
  3. Yuan Gao (336 papers)
  4. Tianzhu Zhang (61 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.