VOLoc: Visual Place Recognition by Querying Compressed Lidar Map (2402.15961v1)
Abstract: The availability of city-scale Lidar maps enables the potential of city-scale place recognition using mobile cameras. However, the city-scale Lidar maps generally need to be compressed for storage efficiency, which increases the difficulty of direct visual place recognition in compressed Lidar maps. This paper proposes VOLoc, an accurate and efficient visual place recognition method that exploits geometric similarity to directly query the compressed Lidar map via the real-time captured image sequence. In the offline phase, VOLoc compresses the Lidar maps using a \emph{Geometry-Preserving Compressor} (GPC), in which the compression is reversible, a crucial requirement for the downstream 6DoF pose estimation. In the online phase, VOLoc proposes an online Geometric Recovery Module (GRM), which is composed of online Visual Odometry (VO) and a point cloud optimization module, such that the local scene structure around the camera is online recovered to build the \emph{Querying Point Cloud} (QPC). Then the QPC is compressed by the same GPC, and is aggregated into a global descriptor by an attention-based aggregation module, to query the compressed Lidar map in the vector space. A transfer learning mechanism is also proposed to improve the accuracy and the generality of the aggregation network. Extensive evaluations show that VOLoc provides localization accuracy even better than the Lidar-to-Lidar place recognition, setting up a new record for utilizing the compressed Lidar map by low-end mobile cameras. The code are publicly available at https://github.com/Master-cai/VOLoc.
- P.-E. Sarlin, C. Cadena, R. Siegwart, and M. Dymczyk, “From coarse to fine: Robust hierarchical localization at large scale,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12 716–12 725.
- A. Torii, R. Arandjelovic, J. Sivic, M. Okutomi, and T. Pajdla, “24/7 place recognition by view synthesis,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1808–1817.
- E. Brachmann and C. Rother, “Learning less is more - 6d camera localization via 3d surface regression,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. Computer Vision Foundation / IEEE Computer Society, 2018, pp. 4654–4662. [Online]. Available: http://openaccess.thecvf.com/content_cvpr_2018/html/Brachmann_Learning_Less_Is_CVPR_2018_paper.html
- R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “Netvlad: Cnn architecture for weakly supervised place recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5297–5307.
- T. Xie, K. Dai, K. Wang, R. Li, J. Wang, X. Tang, and L. Zhao, “A deep feature aggregation network for accurate indoor camera localization,” IEEE Robotics Autom. Lett., vol. 7, no. 2, pp. 3687–3694, 2022. [Online]. Available: https://doi.org/10.1109/LRA.2022.3146946
- T. Sattler, B. Leibe, and L. Kobbelt, “Efficient & effective prioritized matching for large-scale image-based localization,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 9, pp. 1744–1756, 2016.
- C. Toft, W. Maddern, A. Torii, L. Hammarstrand, E. Stenborg, D. Safari, M. Okutomi, M. Pollefeys, J. Sivic, T. Pajdla et al., “Long-term visual localization revisited,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 4, pp. 2074–2088, 2020.
- H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631.
- P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, V. Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. M. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y. Zhang, J. Shlens, Z. Chen, and D. Anguelov, “Scalability in perception for autonomous driving: Waymo open dataset,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2443–2451, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:209140225
- D. Cattaneo, M. Vaghi, S. Fontana, A. L. Ballardini, and D. G. Sorrenti, “Global visual localization in lidar-maps through shared 2d-3d embedding space,” in 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 4365–4371.
- L. Yang, R. Shrestha, W. Li, S. Liu, G. Zhang, Z. Cui, and P. Tan, “Scenesqueezer: Learning to compress scene for camera relocalization,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. IEEE, 2022, pp. 8249–8258. [Online]. Available: https://doi.org/10.1109/CVPR52688.2022.00808
- Y. Li, S. Zheng, Z. Yu, B. Yu, S. Cao, L. Luo, and H. Shen, “I2p-rec: Recognizing images on large-scale point cloud maps through bird’s eye view projections,” ArXiv, vol. abs/2303.01043, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:257280108
- Q. Sun, H. Liu, J. He, Z. Fan, and X. Du, “DAGC: employing dual attention and graph convolution for point cloud based place recognition,” in Proceedings of the 2020 on International Conference on Multimedia Retrieval, ICMR 2020, Dublin, Ireland, June 8-11, 2020, C. Gurrin, B. Þ. Jónsson, N. Kando, K. Schöffmann, Y. P. Chen, and N. E. O’Connor, Eds. ACM, 2020, pp. 224–232. [Online]. Available: https://doi.org/10.1145/3372278.3390693
- M. A. Uy and G. H. Lee, “Pointnetvlad: Deep point cloud based retrieval for large-scale place recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4470–4479.
- X. Wei, I. A. Barsan, S. Wang, J. Martinez, and R. Urtasun, “Learning to localize through compressed binary maps,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, 2019, pp. 10 316–10 324. [Online]. Available: http://openaccess.thecvf.com/content_CVPR_2019/html/Wei_Learning_to_Localize_Through_Compressed_Binary_Maps_CVPR_2019_paper.html
- L. Wiesmann, R. Marcuzzi, C. Stachniss, and J. Behley, “Retriever: Point cloud retrieval in compressed 3d maps,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 10 925–10 932.
- J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 3, pp. 611–625, 2017.
- C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. Montiel, and J. D. Tardós, “Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,” IEEE Transactions on Robotics, vol. 37, no. 6, pp. 1874–1890, 2021.
- T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE Transactions on Robotics, vol. 34, no. 4, pp. 1004–1020, 2018.
- A. Jaegle, F. Gimeno, A. Brock, O. Vinyals, A. Zisserman, and J. Carreira, “Perceiver: General perception with iterative attention,” in International conference on machine learning. PMLR, 2021, pp. 4651–4664.
- A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research, vol. 32, pp. 1231 – 1237, 2013.
- D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, pp. 91–110, 2004.
- D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 224–236.
- J. Sivic and A. Zisserman, “Video google: A text retrieval approach to object matching in videos,” in Computer Vision, IEEE International Conference on, vol. 3. IEEE Computer Society, 2003, pp. 1470–1470.
- H. Jégou, M. Douze, C. Schmid, and P. Pérez, “Aggregating local descriptors into a compact image representation,” in 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, 2010, pp. 3304–3311.
- G. Berton, C. Masone, and B. Caputo, “Rethinking visual geo-localization for large-scale applications,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4878–4888.
- Y. Zhu, J. Wang, L. Xie, and L. Zheng, “Attention-based pyramid aggregation network for visual place recognition,” in Proceedings of the 26th ACM international conference on Multimedia, 2018, pp. 99–107.
- Y. Wang, H. Chen, J. Wang, and Y. Zhu, “Dmpcanet: A low dimensional aggregation network for visual place recognition,” in Proceedings of the 2022 International Conference on Multimedia Retrieval, 2022, pp. 24–28.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- S. Hausler, S. Garg, M. Xu, M. Milford, and T. Fischer, “Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 141–14 152.
- H. Jin Kim, E. Dunn, and J.-M. Frahm, “Learned contextual feature reweighting for image geo-localization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2136–2145.
- G. Peng, Y. Yue, J. Zhang, Z. Wu, X. Tang, and D. Wang, “Semantic reinforced attention learning for visual place recognition,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 13 415–13 422.
- G. Kim and A. Kim, “Scan context: Egocentric spatial descriptor for place recognition within 3d point cloud map,” 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4802–4809, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:57755993
- G. Kim, S. Choi, and A. Kim, “Scan context++: Structural place recognition robust to rotation and lateral variations in urban environments,” IEEE Transactions on Robotics, vol. 38, pp. 1856–1874, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:238198272
- X. Xu, H. Yin, Z. Chen, Y. Li, Y. Wang, and R. Xiong, “Disco: Differentiable scan context with orientation,” IEEE Robotics and Automation Letters, vol. 6, pp. 2791–2798, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:224814531
- L. Luo, S. Zheng, Y. Li, Y. H. Fan, B. Yu, S. Cao, and H. Shen, “Bevplace: Learning lidar-based place recognition using bird’s eye view images,” ArXiv, vol. abs/2302.14325, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:257232932
- C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660.
- Z. Liu, S. Zhou, C. Suo, P. Yin, W. Chen, H. Wang, H. Li, and Y.-H. Liu, “Lpd-net: 3d point cloud learning for large-scale place recognition and environment analysis,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 2831–2840.
- W. Zhang and C. Xiao, “Pcan: 3d attention map learning using contextual information for point cloud based retrieval,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12 436–12 445.
- Q. Sun, H. Liu, J. He, Z. Fan, and X. Du, “Dagc: Employing dual attention and graph convolution for point cloud based place recognition,” in Proceedings of the 2020 International Conference on Multimedia Retrieval, 2020, pp. 224–232.
- J. Komorowski, “Minkloc3d: Point cloud based large-scale place recognition,” in 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), 2021, pp. 1789–1798.
- ——, “Improving point cloud based place recognition with ranking-based loss and large batch training,” in 26th International Conference on Pattern Recognition, ICPR 2022, Montreal, QC, Canada, August 21-25, 2022. IEEE, 2022, pp. 3699–3705. [Online]. Available: https://doi.org/10.1109/ICPR56361.2022.9956458
- J. Ma, J. Zhang, J. Xu, R. Ai, W. Gu, and X. Chen, “Overlaptransformer: An efficient and yaw-angle-invariant transformer network for lidar-based place recognition,” IEEE Robotics and Automation Letters, vol. 7, pp. 6958–6965, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:249223069
- R. W. Wolcott and R. M. Eustice, “Visual localization within lidar maps for automated urban driving,” in 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2014, pp. 176–183.
- D. Cattaneo, M. Vaghi, A. L. Ballardini, S. Fontana, D. G. Sorrenti, and W. Burgard, “Cmrnet: Camera to lidar-map registration,” in 2019 IEEE intelligent transportation systems conference (ITSC). IEEE, 2019, pp. 1283–1289.
- T. Caselitz, B. Steder, M. Ruhnke, and W. Burgard, “Monocular camera localization in 3d lidar maps,” in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2016, pp. 1926–1931.
- Q. Li, J. Zhu, J. Liu, R. Cao, H. Fu, J. M. Garibaldi, Q. Li, B. Liu, and G. Qiu, “3d map-guided single indoor image localization refinement,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 161, pp. 13–26, 2020.
- M. Feng, S. Hu, M. H. Ang, and G. H. Lee, “2d3d-matchnet: Learning to match keypoints across 2d image and 3d point cloud,” in 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 4790–4796.
- Y. Zhong, “Intrinsic shape signatures: A shape descriptor for 3d object recognition,” in 2009 IEEE 12th international conference on computer vision workshops, ICCV workshops. IEEE, 2009, pp. 689–696.
- Q.-H. Pham, M. A. Uy, B.-S. Hua, D. T. Nguyen, G. Roig, and S.-K. Yeung, “Lcd: Learned cross-domain descriptors for 2d-3d matching,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 11 856–11 864.
- L. Wiesmann, A. Milioto, X. Chen, C. Stachniss, and J. Behley, “Deep compression for dense point cloud maps,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2060–2067, 2021.
- H. Thomas, C. Qi, J.-E. Deschaud, B. Marcotegui, F. Goulette, and L. J. Guibas, “Kpconv: Flexible and deformable convolution for point clouds,” 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6410–6419, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:121328056
- W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 year, 1000 km: The oxford robotcar dataset,” The International Journal of Robotics Research, vol. 36, no. 1, pp. 3–15, 2017.
- Xudong Cai (13 papers)
- Yongcai Wang (28 papers)
- Zhe Huang (57 papers)
- Yu Shao (10 papers)
- Deying Li (25 papers)