Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Quantity-Aware Coarse-to-Fine Correspondence for Image-to-Point Cloud Registration (2307.07142v2)

Published 14 Jul 2023 in cs.CV

Abstract: Image-to-point cloud registration aims to determine the relative camera pose between an RGB image and a reference point cloud, serving as a general solution for locating 3D objects from 2D observations. Matching individual points with pixels can be inherently ambiguous due to modality gaps. To address this challenge, we propose a framework to capture quantity-aware correspondences between local point sets and pixel patches and refine the results at both the point and pixel levels. This framework aligns the high-level semantics of point sets and pixel patches to improve the matching accuracy. On a coarse scale, the set-to-patch correspondence is expected to be influenced by the quantity of 3D points. To achieve this, a novel supervision strategy is proposed to adaptively quantify the degrees of correlation as continuous values. On a finer scale, point-to-pixel correspondences are refined from a smaller search space through a well-designed scheme, which incorporates both resampling and quantity-aware priors. Particularly, a confidence sorting strategy is proposed to proportionally select better correspondences at the final stage. Leveraging the advantages of high-quality correspondences, the problem is successfully resolved using an efficient Perspective-n-Point solver within the framework of random sample consensus (RANSAC). Extensive experiments on the KITTI Odometry and NuScenes datasets demonstrate the superiority of our method over the state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. H. Durrant-Whyte and T. Bailey, “Simultaneous localization and mapping: part i,” IEEE Robotics and Automation Magazine, vol. 13, no. 2, pp. 99–110, 2006.
  2. R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb-slam: a versatile and accurate monocular slam system,” IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147–1163, 2015.
  3. A.-D. Nguyen, S. Choi, W. Kim, J. Kim, H. Oh, J. Kang, and S. Lee, “Single-image 3-d reconstruction: Rethinking point cloud deformation,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–15, 2022.
  4. B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering,” ACM Transactions on Graphics, vol. 42, no. 4, 2023.
  5. A. Tsaregorodtsev, J. Muller, J. Strohbeck, M. Herrmann, M. Buchholz, and V. Belagiannis, “Extrinsic camera calibration with semantic segmentation,” in 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC).   IEEE, 2022, pp. 3781–3787.
  6. Y. Liao, J. Li, S. Kang, Q. Li, G. Zhu, S. Yuan, Z. Dong, and B. Yang, “Se-calib: Semantic edges based lidar-camera boresight online calibration in urban scenes,” IEEE Transactions on Geoscience and Remote Sensing, 2023.
  7. Y. Zheng, Y. Kuang, S. Sugimoto, K. Astrom, and M. Okutomi, “Revisiting the pnp problem: A fast, general and optimal solution,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 2344–2351.
  8. X. S. Gao, X. R. Hou, J. Tang, and H. F. Cheng, “Complete solution classification for the perspective-three-point problem,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 8, pp. 930–943, 2003.
  9. V. Lepetit, F. Moreno-Noguer, and P. Fua, “Epnp: An accurate o(n) solution to the pnp problem,” International Journal of Computer Vision, vol. 81, pp. 155–166, 2009.
  10. M. Feng, S. Hu, M. H. Ang, and G. H. Lee, “2d3d-matchnet: Learning to match keypoints across 2d image and 3d point cloud,” in 2019 International Conference on Robotics and Automation (ICRA).   IEEE, 2019, pp. 4790–4796.
  11. D. G. Lowe, “Object recognition from local scale-invariant features,” in Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2.   IEEE, 1999, pp. 1150–1157.
  12. Y. Zhong, “Intrinsic shape signatures: A shape descriptor for 3d object recognition,” in 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV workshops.   IEEE, 2009, pp. 689–696.
  13. M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, and S. C. Hoi, “Deep learning for person re-identification: A survey and outlook,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 6, pp. 2872–2893, 2021.
  14. J. Li and G. H. Lee, “Deepi2p: Image-to-point cloud registration via deep classification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15 960–15 969.
  15. S. Ren, Y. Zeng, J. Hou, and X. Chen, “Corri2p: Deep image-to-point cloud registration via dense correspondence,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 3, pp. 1198–1208, 2023.
  16. Q. Zhou, T. Sattler, and L. Leal-Taixe, “Patch2pix: Epipolar-guided pixel-level correspondences,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4669–4678.
  17. J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “Loftr: Detector-free local feature matching with transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8922–8931.
  18. T. C. Mok and A. Chung, “Affine medical image registration with coarse-to-fine vision transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 20 835–20 844.
  19. H. Yu, F. Li, M. Saleh, B. Busam, and S. Ilic, “Cofinet: Reliable coarse-to-fine correspondences for robust pointcloud registration,” Advances in Neural Information Processing Systems, vol. 34, pp. 23 872–23 884, 2021.
  20. Z. Qin, H. Yu, C. Wang, Y. Guo, Y. Peng, and K. Xu, “Geometric transformer for fast and robust point cloud registration,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11 143–11 152.
  21. L. Li, Y. Ma, K. Tang, X. Zhao, C. Chen, J. Huang, J. Mei, and Y. Liu, “Geo-localization with transformer-based 2d-3d match network,” IEEE Robotics and Automation Letters, vol. 8, no. 8, pp. 4855–4862, 2023.
  22. M. Li, Z. Qin, Z. Gao, R. Yi, C. Zhu, Y. Guo, and K. Xu, “2d3d-matr: 2d-3d matching transformer for detection-free registration between images and point clouds,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 14 128–14 138.
  23. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  24. X. Yu, Y. Rao, Z. Wang, Z. Liu, J. Lu, and J. Zhou, “Pointr: Diverse point cloud completion with geometry-aware transformers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 12 498–12 507.
  25. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2021.
  26. P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 4938–4947.
  27. A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231–1237, 2013.
  28. H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 621–11 631.
  29. J. Ma, X. Jiang, A. Fan, J. Jiang, and J. Yan, “Image matching from handcrafted to deep features: A survey,” International Journal of Computer Vision, vol. 129, pp. 23–79, 2021.
  30. D. D. Diel, P. DeBitetto, and S. Teller, “Epipolar constraints for vision-aided inertial navigation,” in 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION’05)-Volume 1, vol. 2.   IEEE, 2005, pp. 221–228.
  31. B. S. Reddy and B. N. Chatterji, “An fft-based technique for translation, rotation, and scale-invariant image registration,” IEEE Transactions on Image Processing, vol. 5, no. 8, pp. 1266–1271, 1996.
  32. E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: An efficient alternative to sift or surf,” in 2011 International Conference on Computer Vision.   IEEE, 2011, pp. 2564–2571.
  33. J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “Loftr: Detector-free local feature matching with transformers,” in Computer Vision and Pattern Recognition, 2021.
  34. G. Xu, J. Cheng, P. Guo, and X. Yang, “Attention concatenation volume for accurate and efficient stereo matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 12 981–12 990.
  35. Y. Rao, Y. Ju, C. Li, E. Rigall, J. Yang, H. Fan, and J. Dong, “Learning general descriptors for image matching with regression feedback,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 11, pp. 6693–6707, 2023.
  36. H. Pan, Y. Chen, Z. He, F. Meng, and N. Fan, “Tcdesc: Learning topology consistent descriptors for image matching,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 5, pp. 2845–2855, 2022.
  37. Y. Fu, P. Zhang, B. Liu, Z. Rong, and Y. Wu, “Learning to reduce scale differences for large-scale invariant image matching,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 3, pp. 1335–1348, 2023.
  38. P. Besl and N. D. McKay, “A method for registration of 3-d shapes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 2, pp. 239–256, 1992.
  39. S. Rusinkiewicz and M. Levoy, “Efficient variants of the icp algorithm,” in Proceedings Third International Conference on 3-D Digital Imaging and Modeling, 2001, pp. 145–152.
  40. J. Yang, H. Li, and Y. Jia, “Go-icp: Solving 3d registration efficiently and globally optimally,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 1457–1464.
  41. Y. Wang and J. M. Solomon, “Deep closest point: Learning representations for point cloud registration,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3523–3532.
  42. Y. Aoki, H. Goforth, R. A. Srivatsan, and S. Lucey, “Pointnetlk: Robust & efficient point cloud registration using pointnet,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7163–7172.
  43. S. Huang, Z. Gojcic, M. Usvyatsov, A. Wieser, and K. Schindler, “Predator: Registration of 3d point clouds with low overlap,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4267–4276.
  44. F. Lu, G. Chen, Y. Liu, L. Zhang, S. Qu, S. Liu, and R. Gu, “Hregnet: A hierarchical network for large-scale outdoor lidar point cloud registration,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 16 014–16 023.
  45. S. Ao, Q. Hu, H. Wang, K. Xu, and Y. Guo, “Buffer: Balancing accuracy, efficiency, and generalizability in point cloud registration,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1255–1264.
  46. H. Thomas, C. R. Qi, J.-E. Deschaud, B. Marcotegui, F. Goulette, and L. J. Guibas, “Kpconv: Flexible and deformable convolution for point clouds,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6411–6420.
  47. Z. Zhang, Y. Dai, B. Fan, J. Sun, and M. He, “Learning a task-specific descriptor for robust matching of 3d point clouds,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 12, pp. 8462–8475, 2022.
  48. Y. Wu, X. Hu, Y. Zhang, M. Gong, W. Ma, and Q. Miao, “Sacf-net: Skip-attention based correspondence filtering network for point cloud registration,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 8, pp. 3585–3595, 2023.
  49. E. Hoffer and N. Ailon, “Deep metric learning using triplet network,” in Similarity-Based Pattern Recognition: Third International Workshop, SIMBAD 2015, Copenhagen, Denmark, October 12-14, 2015. Proceedings 3.   Springer, 2015, pp. 84–92.
  50. X. Xu, L. He, H. Lu, L. Gao, and Y. Ji, “Deep adversarial metric learning for cross-modal retrieval,” World Wide Web, vol. 22, pp. 657–672, 2019.
  51. M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.
  52. J. Gou, B. Yu, S. J. Maybank, and D. Tao, “Knowledge distillation: A survey,” IJCV, vol. 129, pp. 1789–1819, 2021.
  53. M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transport,” Advances in neural information processing systems, vol. 26, 2013.
  54. D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018, pp. 224–236.
  55. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
  56. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 652–660.
  57. C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  58. J. Li, B. M. Chen, and G. H. Lee, “So-net: Self-organizing network for point cloud analysis,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 9397–9406.
  59. H. Zhao, L. Jiang, J. Jia, P. H. Torr, and V. Koltun, “Point transformer,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 16 259–16 268.
  60. R. Sinkhorn and P. Knopp, “Concerning nonnegative matrices and doubly stochastic matrices,” Pacific Journal of Mathematics, vol. 21, no. 2, pp. 343–348, 1967.
  61. Y. Jeon and S.-W. Seo, “EFGHNet: A versatile image-to-point cloud registration network for extreme outdoor environment,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 7511–7517, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Gongxin Yao (8 papers)
  2. Yixin Xuan (5 papers)
  3. Yiwei Chen (19 papers)
  4. Yu Pan (154 papers)