Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

KVN: Keypoints Voting Network with Differentiable RANSAC for Stereo Pose Estimation (2307.11543v3)

Published 21 Jul 2023 in cs.CV and cs.RO

Abstract: Object pose estimation is a fundamental computer vision task exploited in several robotics and augmented reality applications. Many established approaches rely on predicting 2D-3D keypoint correspondences using RANSAC (Random sample consensus) and estimating the object pose using the PnP (Perspective-n-Point) algorithm. Being RANSAC non-differentiable, correspondences cannot be directly learned in an end-to-end fashion. In this paper, we address the stereo image-based object pose estimation problem by i) introducing a differentiable RANSAC layer into a well-known monocular pose estimation network; ii) exploiting an uncertainty-driven multi-view PnP solver which can fuse information from multiple views. We evaluate our approach on a challenging public stereo object pose estimation dataset and a custom-built dataset we call Transparent Tableware Dataset (TTD), yielding state-of-the-art results against other recent approaches. Furthermore, in our ablation study, we show that the differentiable RANSAC layer plays a significant role in the accuracy of the proposed method. We release with this paper the code of our method and the TTD dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. S. Thalhammer, D. Bauer, P. Hönig, J.-B. Weibel, J. García-Rodríguez, and M. Vincze, “Challenges for monocular 6d object pose estimation in robotics,” ArXiv:2307.12172, 2023.
  2. A. Amini, A. Selvam Periyasamy, and S. Behnke, “Yolopose: Transformer-based multi-object 6d pose estimation using keypoint regression,” in Intelligent Autonomous Systems 17, 2023.
  3. S. Peng, X. Zhou, Y. Liu, H. Lin, Q. Huang, and H. Bao, “Pvnet: Pixel-wise voting network for 6dof object pose estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 6, pp. 3212–3223, 2022.
  4. X. Yu, Z. Zhuang, P. Koniusz, and H. Li, “6dof object pose estimation via differentiable proxy voting regularizer,” in BMVC, 2020.
  5. C. Song, J. Song, and Q.-X. Huang, “Hybridpose: 6d object pose estimation under hybrid representations,” in CVPR, 2020.
  6. J. Jiang, Z. He, X. Zhao, S. Zhang, C. Wu, and Y. Wang, “Mlfnet: Monocular lifting fusion network for 6dof texture-less object pose estimation,” Neurocomputing, vol. 504, pp. 16–29, 2022.
  7. Y. Hu, P. Fua, W. Wang, and M. Salzmann, “Single-stage 6d object pose estimation,” in CVPR, 2020.
  8. M. Oberweger, M. Rad, and V. Lepetit, “Making deep heatmaps robust to partial occlusions for 3d object pose estimation,” in ECCV, 2018.
  9. C. Wang, D. Xu, Y. Zhu, R. Martín-Martín, C. Lu, L. Fei-Fei, and S. Savarese, “Densefusion: 6d object pose estimation by iterative dense fusion,” in CVPR, 2019.
  10. W. Chen, J. Duan, H. Basevi, H. J. Chang, and A. Leonardis, “Pointposenet: Point pose network for robust 6d object pose estimation,” in IEEE Winter Conference on Applications of Computer Vision (WACV), 2020.
  11. Y. Wu, M. Zand, A. Etemad, and M. Greenspan, “Vote from the center: 6 dof pose estimation in rgb-d images by radial keypoint voting,” in ECCV, 2022.
  12. X. Liu, R. Jonschkowski, A. Angelova, and K. Konolige, “Keypose: Multi-view 3d labeling and keypoint estimation for transparent objects,” in CVPR, 2020.
  13. J. Chang, M. Kim, S. Kang, H. Han, S. Hong, K. Jang, and S. Kang, “Ghostpose: Multi-view pose estimation of transparent objects for robot hand grasping,” in IROS, 2021.
  14. K. Chen, S. James, C. Sui, Y.-H. Liu, P. Abbeel, and Q. Dou, “Stereopose: Category-level 6d transparent object pose estimation from stereo images via back-view nocs,” in ICRA, 2023.
  15. S. Sajjan, M. Moore, M. Pan, G. Nagaraja, J. Lee, A. Zeng, and S. Song, “Cleargrasp: 3d shape estimation of transparent objects for manipulation,” in ICRA, 2020.
  16. S. Zakharov, I. Shugurov, and S. Ilic, “Dpod: 6d pose object detector and refiner,” in ICCV, 2019.
  17. Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, “Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes,” in Robotics: Science and Systems (RSS), 2018.
  18. E. Brachmann, A. Krull, S. Nowozin, J. Shotton, F. Michel, S. Gumhold, and C. Rother, “Dsac - differentiable ransac for camera localization,” in CVPR, 2017.
  19. E. Brachmann and C. Rother, “Learning less is more - 6d camera localization via 3d surface regression,” in CVPR, 2018.
  20. T. Wei, Y. Patel, A. Shekhovtsov, J. Matas, and D. Barath, “Generalized differentiable ransac,” in ICCV, 2023.
  21. D. Campbell, L. Liu, and S. Gould, “Solving the blind perspective-n-point problem end-to-end with robust differentiable geometric optimization,” in ECCV, 2020.
  22. B. Chen, T.-J. Chin, Á. Parra, J. Cao, and N. Li, “End-to-end learnable geometric vision by backpropagating pnp optimization,” in CVPR, 2020.
  23. Z. Fan, Y. Zhu, Y. He, Q. Sun, H. Liu, and J. He, “Deep learning on monocular object pose detection and tracking: A comprehensive overview,” ACM Comput. Surv., vol. 55, no. 4, 2022.
  24. Y. Bukschat and M. Vetter, “Efficientpose: An efficient, accurate and scalable end-to-end 6d multi object pose estimation approach,” ArXiv:2011.04307, 2020.
  25. M. Tan, R. Pang, and Q. V. Le, “Efficientdet: Scalable and efficient object detection,” in CVPR, 2020.
  26. W. Kehl, F. Manhardt, F. Tombari, S. Ilic, and N. Navab, “Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again,” in ICCV, 2017.
  27. G. Wang, F. Manhardt, F. Tombari, and X. Ji, “Gdr-net: Geometry-guided direct regression network for monocular 6d object pose estimation,” in CVPR, 2021.
  28. A. Trabelsi, M. Chaabane, N. Blanchard, and R. Beveridge, “A pose proposal and refinement network for better 6d object pose estimation,” in IEEE Winter Conference on Applications of Computer Vision (WACV), 2021.
  29. M. Rad and V. Lepetit, “Bb8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth,” in ICCV, 2017.
  30. P. Li, H. Zhao, P. Liu, and F. Cao, “Rtm3d: Real-time monocular 3d detection from object keypoints for autonomous driving,” in ECCV, 2020.
  31. T. Cao, F. Luo, Y. Fu, W. Zhang, S. Zheng, and C. Xiao, “Dgecn: A depth-guided edge convolutional network for end-to-end 6d pose estimation,” in CVPR, 2022.
  32. Z. Li, G. Wang, and X. Ji, “Cdpn: Coordinates-based disentangled pose network for real-time rgb-based 6-dof object pose estimation,” in ICCV, 2019.
  33. K. Park, T. Patten, and M. Vincze, “Pix2pose: Pixel-wise coordinate regression of objects for 6d pose estimation,” in ICCV, 2019.
  34. T. Hodaň, D. Baráth, and J. Matas, “Epos: Estimating 6d pose of objects with symmetries,” in CVPR, 2020.
  35. E. Brachmann and C. Rother, “Neural-guided ransac: Learning where to sample model hypotheses,” in ICCV, 2019.
  36. A. van den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural discrete representation learning,” in NIPS, 2017.
  37. G. Mingari Scarpello and D. Ritelli, “A historical outline of the theorem of implicit functions,” Divulgaciones Matemáticas, vol. 10, pp. 171–180, 2002.
  38. H. Chen, P. Wang, F. Wang, W. Tian, L. Xiong, and H. Li, “Epro-pnp: Generalized end-to-end probabilistic perspective-n-points for monocular object pose estimation,” in CVPR, 2022.
  39. H. Wang, S. Sridhar, J. Huang, J. Valentin, S. Song, and L. J. Guibas, “Normalized object coordinate space for category-level 6d object pose and size estimation,” in CVPR, June 2019.
  40. G. Gao, M. Lauri, Y. Wang, X. Hu, J. Zhang, and S. Frintrop, “6d object pose regression via supervised learning on point clouds,” in ICRA, 2020.
  41. I. Lysenkov, V. Eruhimov, and G. Bradski, “Recognition and pose estimation of rigid transparent objects with a kinect sensor,” in Robotics: Science and Systems (RSS), 2013.
  42. C. Phillips, M. Lecce, and K. Daniilidis, “Seeing glassware: From edge detection to pose estimation and shape recovery,” in Robotics: Science and Systems, 2016.
  43. X. Liu, S. Iwase, and K. M. Kitani, “Stereobj-1m: Large-scale stereo image dataset for 6d object pose estimation,” in ICCV, 2021.
  44. F. S. Hill Jr, “The pleasures of “perp dot” products,” in Graphics gems IV, 1994, pp. 138–148.
  45. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in CVPR, 2009.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com