Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition (2403.18762v1)

Published 27 Mar 2024 in cs.CV, cs.AI, and cs.RO

Abstract: Place recognition is an important task for robots and autonomous cars to localize themselves and close loops in pre-built maps. While single-modal sensor-based methods have shown satisfactory performance, cross-modal place recognition that retrieving images from a point-cloud database remains a challenging problem. Current cross-modal methods transform images into 3D points using depth estimation for modality conversion, which are usually computationally intensive and need expensive labeled data for depth supervision. In this work, we introduce a fast and lightweight framework to encode images and point clouds into place-distinctive descriptors. We propose an effective Field of View (FoV) transformation module to convert point clouds into an analogous modality as images. This module eliminates the necessity for depth estimation and helps subsequent modules achieve real-time performance. We further design a non-negative factorization-based encoder to extract mutually consistent semantic features between point clouds and images. This encoder yields more distinctive global descriptors for retrieval. Experimental results on the KITTI dataset show that our proposed methods achieve state-of-the-art performance while running in real time. Additional evaluation on the HAOMO dataset covering a 17 km trajectory further shows the practical generalization capabilities. We have released the implementation of our methods as open source at: https://github.com/haomo-ai/ModaLink.git.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. X. Chen, T. Läbe, A. Milioto, T. Röhling, O. Vysotska, A. Haag, J. Behley, C. Stachniss, and F. Fraunhofer, “OverlapNet: Loop closing for LiDAR-based SLAM,” in Proc. of Robotics: Science and Systems, 2020.
  2. J. Ma, J. Zhang, J. Xu, R. Ai, W. Gu, and X. Chen, “OverlapTransformer: An efficient and yaw-angle-invariant transformer network for LiDAR-based place recognition,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 6958–6965, 2022.
  3. J. Ma, G. Xiong, J. Xu, and X. Chen, “CVTNet: A cross-view transformer network for place recognition using LiDAR data,” arXiv preprint arXiv: 2302.01665, 2023.
  4. R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “NetVLAD: CNN architecture for weakly supervised place recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 5297–5307, 2016.
  5. S. Zhu, M. Shah, and C. Chen, “TransGeo: Transformer is all you need for cross-view image geo-localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1162–1171, 2022.
  6. K. Żywanowski, A. Banaszczyk, and M. R. Nowicki, “Comparison of camera-based and 3D LiDAR-based place recognition across weather conditions,” in 2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV), pp. 886–891, IEEE, 2020.
  7. S. Zheng, Y. Li, Z. Yu, B. Yu, S.-Y. Cao, M. Wang, J. Xu, R. Ai, W. Gu, L. Luo, et al., “I2P-Rec: Recognizing images on large-scale point cloud maps through bird’s eye view projections,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1395–1400, IEEE, 2023.
  8. A. J. Lee, S. Song, H. Lim, W. Lee, and H. Myung, “(LC)22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: LiDAR-camera loop constraints for cross-modal place recognition,” IEEE Robotics and Automation Letters, vol. 8, no. 6, pp. 3589–3596, 2023.
  9. R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “ORB-SLAM: A versatile and accurate monocular SLAM system,” IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147–1163, 2017.
  10. R. Mur-Artal and J. D. Tardós, “ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras,” IEEE transactions on robotics, vol. 33, no. 5, pp. 1255–1262, 2017.
  11. E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: An efficient alternative to SIFT or SURF,” in International Conference on Computer Vision, pp. 2564–2571, IEEE, 2011.
  12. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, pp. 91–110, 2004.
  13. D. Galvez-Lpez and J. D. Tardos, “Bags of binary words for fast place recognition in image sequences,” IEEE Transactions on Robotics, vol. 28, no. 5, pp. 1188–1197, 2012.
  14. S. Hausler, S. Garg, M. Xu, M. Milford, and T. Fischer, “Patch-NetVLAD: Multi-scale fusion of locally-global descriptors for place recognition,” 2021.
  15. J. Zhu, Y. Ai, B. Tian, D. Cao, and S. Scherer, “Visual place recognition in long-term and large-scale environment based on CNN feature,” in 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 1679–1685, IEEE, 2018.
  16. H. Yang, X. Lu, and Y. Zhu, “Cross-view geo-localization with evolving transformer,” arXiv preprint arXiv:2107.00842, 2021.
  17. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” 2021.
  18. M. Angelina Uy and G. Hee Lee, “PointNetVLAD: Deep point cloud based retrieval for large-scale place recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 4470–4479, IEEE, 2018.
  19. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep learning on point sets for 3D classification and segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660, 2017.
  20. Z. Liu, S. Zhou, C. Suo, P. Yin, W. Chen, H. Wang, H. Li, and Y. Liu, “LPD-Net: 3D point cloud learning for large-scale place recognition and environment analysis,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2831–2840, 2019.
  21. Y. Xia, Y. Xu, S. Li, R. Wang, J. Du, D. Cremers, and U. Stilla, “SOE-Net: A self-attention and orientation encoding network for point cloud based place recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 11343–11352, 2021.
  22. G. Kim and A. Kim, “Scan Context: Egocentric spatial descriptor for place recognition within 3D point cloud map,” in IEEE International Conference on Intelligent Robots and Systems, pp. 4802–4809, IEEE, 2018.
  23. X. Chen, T. Läbe, A. Milioto, T. Röhling, J. Behley, and C. Stachniss, “Overlapnet: A siamese network for computing lidar scan similarity with applications to loop closing and localization,” Autonomous Robots, pp. 1–21, 2022.
  24. L. Luo, S.-Y. Cao, B. Han, H.-L. Shen, and J. Li, “BVMatch: Lidar-based place recognition using bird’s-eye view images,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 6076–6083, 2021.
  25. L. Luo, S. Zheng, Y. Li, Y. Fan, B. Yu, S.-Y. Cao, J. Li, and H.-L. Shen, “BEVPlace: Learning LiDAR-based place recognition using bird’s eye view images,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8700–8709, 2023.
  26. D. Cattaneo, M. Vaghi, S. Fontana, A. L. Ballardini, and D. G. Sorrenti, “Global visual localization in LiDAR-maps through shared 2D-3D embedding space,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 4365–4371, IEEE, 2020.
  27. L. Bernreiter, L. Ott, J. Nieto, R. Siegwart, and C. Cadena, “Spherical multi-modal place recognition for heterogeneous sensor systems,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 1743–1750, IEEE, 2021.
  28. M. Feng, S. Hu, M. H. Ang, and G. H. Lee, “2D3D-Matchnet: Learning to match keypoints across 2D image and 3D point cloud,” in 2019 International Conference on Robotics and Automation (ICRA), pp. 4790–4796, IEEE, 2019.
  29. K. Chen, H. Yu, W. Yang, L. Yu, S. Scherer, and G.-S. Xia, “I2D-Loc: Camera localization via image to lidar depth flow,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 194, pp. 209–221, 2022.
  30. B. Wang, C. Chen, Z. Cui, J. Qin, C. X. Lu, Z. Yu, P. Zhao, Z. Dong, F. Zhu, N. Trigoni, et al., “P2-Net: Joint description and detection of local features for pixel and point matching,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16004–16013, 2021.
  31. Y. Liu, G. Chen, and A. Knoll, “Globally optimal vertical direction estimation in Atlanta world,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 4, pp. 1949–1962, 2020.
  32. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
  33. R. Dubé, D. Dugas, E. Stumm, J. Nieto, R. Siegwart, and C. Cadena, “SegMatch: Segment based place recognition in 3D point clouds,” in IEEE International Conference on Robotics and Automation, pp. 5266–5272, IEEE, 2017.
  34. C. Ding, X. He, and H. D. Simon, “On the equivalence of nonnegative matrix factorization and spectral clustering,” in Proceedings of the 2005 SIAM international conference on data mining, pp. 606–610, SIAM, 2005.
  35. F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823, 2015.
  36. J. Johnson, M. Douze, and H. Jégou, “Billion-scale similarity search with gpus,” IEEE Transactions on Big Data, vol. 7, no. 3, pp. 535–547, 2019.
  37. A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the KITTI vision benchmark suite,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361, 2012.
  38. J. H. Z. Z. H. H. Y. C. Zhenda Xie, Zigang Geng, “Revealing the dark secrets of masked image modeling,” arXiv preprint arXiv:2205.13543, 2022.
  39. J.-R. Chang and Y.-S. Chen, “Pyramid stereo matching network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5418, 2018.
  40. X. Cheng, Y. Zhong, M. Harandi, Y. Dai, X. Chang, H. Li, T. Drummond, and Z. Ge, “Hierarchical neural architecture search for deep stereo matching,” Advances in Neural Information Processing Systems, vol. 33, 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com