KVN: Keypoints Voting Network with Differentiable RANSAC for Stereo Pose Estimation (2307.11543v3)
Abstract: Object pose estimation is a fundamental computer vision task exploited in several robotics and augmented reality applications. Many established approaches rely on predicting 2D-3D keypoint correspondences using RANSAC (Random sample consensus) and estimating the object pose using the PnP (Perspective-n-Point) algorithm. Being RANSAC non-differentiable, correspondences cannot be directly learned in an end-to-end fashion. In this paper, we address the stereo image-based object pose estimation problem by i) introducing a differentiable RANSAC layer into a well-known monocular pose estimation network; ii) exploiting an uncertainty-driven multi-view PnP solver which can fuse information from multiple views. We evaluate our approach on a challenging public stereo object pose estimation dataset and a custom-built dataset we call Transparent Tableware Dataset (TTD), yielding state-of-the-art results against other recent approaches. Furthermore, in our ablation study, we show that the differentiable RANSAC layer plays a significant role in the accuracy of the proposed method. We release with this paper the code of our method and the TTD dataset.
- S. Thalhammer, D. Bauer, P. Hönig, J.-B. Weibel, J. García-Rodríguez, and M. Vincze, “Challenges for monocular 6d object pose estimation in robotics,” ArXiv:2307.12172, 2023.
- A. Amini, A. Selvam Periyasamy, and S. Behnke, “Yolopose: Transformer-based multi-object 6d pose estimation using keypoint regression,” in Intelligent Autonomous Systems 17, 2023.
- S. Peng, X. Zhou, Y. Liu, H. Lin, Q. Huang, and H. Bao, “Pvnet: Pixel-wise voting network for 6dof object pose estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 6, pp. 3212–3223, 2022.
- X. Yu, Z. Zhuang, P. Koniusz, and H. Li, “6dof object pose estimation via differentiable proxy voting regularizer,” in BMVC, 2020.
- C. Song, J. Song, and Q.-X. Huang, “Hybridpose: 6d object pose estimation under hybrid representations,” in CVPR, 2020.
- J. Jiang, Z. He, X. Zhao, S. Zhang, C. Wu, and Y. Wang, “Mlfnet: Monocular lifting fusion network for 6dof texture-less object pose estimation,” Neurocomputing, vol. 504, pp. 16–29, 2022.
- Y. Hu, P. Fua, W. Wang, and M. Salzmann, “Single-stage 6d object pose estimation,” in CVPR, 2020.
- M. Oberweger, M. Rad, and V. Lepetit, “Making deep heatmaps robust to partial occlusions for 3d object pose estimation,” in ECCV, 2018.
- C. Wang, D. Xu, Y. Zhu, R. Martín-Martín, C. Lu, L. Fei-Fei, and S. Savarese, “Densefusion: 6d object pose estimation by iterative dense fusion,” in CVPR, 2019.
- W. Chen, J. Duan, H. Basevi, H. J. Chang, and A. Leonardis, “Pointposenet: Point pose network for robust 6d object pose estimation,” in IEEE Winter Conference on Applications of Computer Vision (WACV), 2020.
- Y. Wu, M. Zand, A. Etemad, and M. Greenspan, “Vote from the center: 6 dof pose estimation in rgb-d images by radial keypoint voting,” in ECCV, 2022.
- X. Liu, R. Jonschkowski, A. Angelova, and K. Konolige, “Keypose: Multi-view 3d labeling and keypoint estimation for transparent objects,” in CVPR, 2020.
- J. Chang, M. Kim, S. Kang, H. Han, S. Hong, K. Jang, and S. Kang, “Ghostpose: Multi-view pose estimation of transparent objects for robot hand grasping,” in IROS, 2021.
- K. Chen, S. James, C. Sui, Y.-H. Liu, P. Abbeel, and Q. Dou, “Stereopose: Category-level 6d transparent object pose estimation from stereo images via back-view nocs,” in ICRA, 2023.
- S. Sajjan, M. Moore, M. Pan, G. Nagaraja, J. Lee, A. Zeng, and S. Song, “Cleargrasp: 3d shape estimation of transparent objects for manipulation,” in ICRA, 2020.
- S. Zakharov, I. Shugurov, and S. Ilic, “Dpod: 6d pose object detector and refiner,” in ICCV, 2019.
- Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, “Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes,” in Robotics: Science and Systems (RSS), 2018.
- E. Brachmann, A. Krull, S. Nowozin, J. Shotton, F. Michel, S. Gumhold, and C. Rother, “Dsac - differentiable ransac for camera localization,” in CVPR, 2017.
- E. Brachmann and C. Rother, “Learning less is more - 6d camera localization via 3d surface regression,” in CVPR, 2018.
- T. Wei, Y. Patel, A. Shekhovtsov, J. Matas, and D. Barath, “Generalized differentiable ransac,” in ICCV, 2023.
- D. Campbell, L. Liu, and S. Gould, “Solving the blind perspective-n-point problem end-to-end with robust differentiable geometric optimization,” in ECCV, 2020.
- B. Chen, T.-J. Chin, Á. Parra, J. Cao, and N. Li, “End-to-end learnable geometric vision by backpropagating pnp optimization,” in CVPR, 2020.
- Z. Fan, Y. Zhu, Y. He, Q. Sun, H. Liu, and J. He, “Deep learning on monocular object pose detection and tracking: A comprehensive overview,” ACM Comput. Surv., vol. 55, no. 4, 2022.
- Y. Bukschat and M. Vetter, “Efficientpose: An efficient, accurate and scalable end-to-end 6d multi object pose estimation approach,” ArXiv:2011.04307, 2020.
- M. Tan, R. Pang, and Q. V. Le, “Efficientdet: Scalable and efficient object detection,” in CVPR, 2020.
- W. Kehl, F. Manhardt, F. Tombari, S. Ilic, and N. Navab, “Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again,” in ICCV, 2017.
- G. Wang, F. Manhardt, F. Tombari, and X. Ji, “Gdr-net: Geometry-guided direct regression network for monocular 6d object pose estimation,” in CVPR, 2021.
- A. Trabelsi, M. Chaabane, N. Blanchard, and R. Beveridge, “A pose proposal and refinement network for better 6d object pose estimation,” in IEEE Winter Conference on Applications of Computer Vision (WACV), 2021.
- M. Rad and V. Lepetit, “Bb8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth,” in ICCV, 2017.
- P. Li, H. Zhao, P. Liu, and F. Cao, “Rtm3d: Real-time monocular 3d detection from object keypoints for autonomous driving,” in ECCV, 2020.
- T. Cao, F. Luo, Y. Fu, W. Zhang, S. Zheng, and C. Xiao, “Dgecn: A depth-guided edge convolutional network for end-to-end 6d pose estimation,” in CVPR, 2022.
- Z. Li, G. Wang, and X. Ji, “Cdpn: Coordinates-based disentangled pose network for real-time rgb-based 6-dof object pose estimation,” in ICCV, 2019.
- K. Park, T. Patten, and M. Vincze, “Pix2pose: Pixel-wise coordinate regression of objects for 6d pose estimation,” in ICCV, 2019.
- T. Hodaň, D. Baráth, and J. Matas, “Epos: Estimating 6d pose of objects with symmetries,” in CVPR, 2020.
- E. Brachmann and C. Rother, “Neural-guided ransac: Learning where to sample model hypotheses,” in ICCV, 2019.
- A. van den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural discrete representation learning,” in NIPS, 2017.
- G. Mingari Scarpello and D. Ritelli, “A historical outline of the theorem of implicit functions,” Divulgaciones Matemáticas, vol. 10, pp. 171–180, 2002.
- H. Chen, P. Wang, F. Wang, W. Tian, L. Xiong, and H. Li, “Epro-pnp: Generalized end-to-end probabilistic perspective-n-points for monocular object pose estimation,” in CVPR, 2022.
- H. Wang, S. Sridhar, J. Huang, J. Valentin, S. Song, and L. J. Guibas, “Normalized object coordinate space for category-level 6d object pose and size estimation,” in CVPR, June 2019.
- G. Gao, M. Lauri, Y. Wang, X. Hu, J. Zhang, and S. Frintrop, “6d object pose regression via supervised learning on point clouds,” in ICRA, 2020.
- I. Lysenkov, V. Eruhimov, and G. Bradski, “Recognition and pose estimation of rigid transparent objects with a kinect sensor,” in Robotics: Science and Systems (RSS), 2013.
- C. Phillips, M. Lecce, and K. Daniilidis, “Seeing glassware: From edge detection to pose estimation and shape recovery,” in Robotics: Science and Systems, 2016.
- X. Liu, S. Iwase, and K. M. Kitani, “Stereobj-1m: Large-scale stereo image dataset for 6d object pose estimation,” in ICCV, 2021.
- F. S. Hill Jr, “The pleasures of “perp dot” products,” in Graphics gems IV, 1994, pp. 138–148.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in CVPR, 2009.