Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VOLoc: Visual Place Recognition by Querying Compressed Lidar Map (2402.15961v1)

Published 25 Feb 2024 in cs.CV

Abstract: The availability of city-scale Lidar maps enables the potential of city-scale place recognition using mobile cameras. However, the city-scale Lidar maps generally need to be compressed for storage efficiency, which increases the difficulty of direct visual place recognition in compressed Lidar maps. This paper proposes VOLoc, an accurate and efficient visual place recognition method that exploits geometric similarity to directly query the compressed Lidar map via the real-time captured image sequence. In the offline phase, VOLoc compresses the Lidar maps using a \emph{Geometry-Preserving Compressor} (GPC), in which the compression is reversible, a crucial requirement for the downstream 6DoF pose estimation. In the online phase, VOLoc proposes an online Geometric Recovery Module (GRM), which is composed of online Visual Odometry (VO) and a point cloud optimization module, such that the local scene structure around the camera is online recovered to build the \emph{Querying Point Cloud} (QPC). Then the QPC is compressed by the same GPC, and is aggregated into a global descriptor by an attention-based aggregation module, to query the compressed Lidar map in the vector space. A transfer learning mechanism is also proposed to improve the accuracy and the generality of the aggregation network. Extensive evaluations show that VOLoc provides localization accuracy even better than the Lidar-to-Lidar place recognition, setting up a new record for utilizing the compressed Lidar map by low-end mobile cameras. The code are publicly available at https://github.com/Master-cai/VOLoc.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. P.-E. Sarlin, C. Cadena, R. Siegwart, and M. Dymczyk, “From coarse to fine: Robust hierarchical localization at large scale,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12 716–12 725.
  2. A. Torii, R. Arandjelovic, J. Sivic, M. Okutomi, and T. Pajdla, “24/7 place recognition by view synthesis,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1808–1817.
  3. E. Brachmann and C. Rother, “Learning less is more - 6d camera localization via 3d surface regression,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018.   Computer Vision Foundation / IEEE Computer Society, 2018, pp. 4654–4662. [Online]. Available: http://openaccess.thecvf.com/content_cvpr_2018/html/Brachmann_Learning_Less_Is_CVPR_2018_paper.html
  4. R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “Netvlad: Cnn architecture for weakly supervised place recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5297–5307.
  5. T. Xie, K. Dai, K. Wang, R. Li, J. Wang, X. Tang, and L. Zhao, “A deep feature aggregation network for accurate indoor camera localization,” IEEE Robotics Autom. Lett., vol. 7, no. 2, pp. 3687–3694, 2022. [Online]. Available: https://doi.org/10.1109/LRA.2022.3146946
  6. T. Sattler, B. Leibe, and L. Kobbelt, “Efficient & effective prioritized matching for large-scale image-based localization,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 9, pp. 1744–1756, 2016.
  7. C. Toft, W. Maddern, A. Torii, L. Hammarstrand, E. Stenborg, D. Safari, M. Okutomi, M. Pollefeys, J. Sivic, T. Pajdla et al., “Long-term visual localization revisited,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 4, pp. 2074–2088, 2020.
  8. H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631.
  9. P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, V. Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. M. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y. Zhang, J. Shlens, Z. Chen, and D. Anguelov, “Scalability in perception for autonomous driving: Waymo open dataset,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2443–2451, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:209140225
  10. D. Cattaneo, M. Vaghi, S. Fontana, A. L. Ballardini, and D. G. Sorrenti, “Global visual localization in lidar-maps through shared 2d-3d embedding space,” in 2020 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 4365–4371.
  11. L. Yang, R. Shrestha, W. Li, S. Liu, G. Zhang, Z. Cui, and P. Tan, “Scenesqueezer: Learning to compress scene for camera relocalization,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022.   IEEE, 2022, pp. 8249–8258. [Online]. Available: https://doi.org/10.1109/CVPR52688.2022.00808
  12. Y. Li, S. Zheng, Z. Yu, B. Yu, S. Cao, L. Luo, and H. Shen, “I2p-rec: Recognizing images on large-scale point cloud maps through bird’s eye view projections,” ArXiv, vol. abs/2303.01043, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:257280108
  13. Q. Sun, H. Liu, J. He, Z. Fan, and X. Du, “DAGC: employing dual attention and graph convolution for point cloud based place recognition,” in Proceedings of the 2020 on International Conference on Multimedia Retrieval, ICMR 2020, Dublin, Ireland, June 8-11, 2020, C. Gurrin, B. Þ. Jónsson, N. Kando, K. Schöffmann, Y. P. Chen, and N. E. O’Connor, Eds.   ACM, 2020, pp. 224–232. [Online]. Available: https://doi.org/10.1145/3372278.3390693
  14. M. A. Uy and G. H. Lee, “Pointnetvlad: Deep point cloud based retrieval for large-scale place recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4470–4479.
  15. X. Wei, I. A. Barsan, S. Wang, J. Martinez, and R. Urtasun, “Learning to localize through compressed binary maps,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019.   Computer Vision Foundation / IEEE, 2019, pp. 10 316–10 324. [Online]. Available: http://openaccess.thecvf.com/content_CVPR_2019/html/Wei_Learning_to_Localize_Through_Compressed_Binary_Maps_CVPR_2019_paper.html
  16. L. Wiesmann, R. Marcuzzi, C. Stachniss, and J. Behley, “Retriever: Point cloud retrieval in compressed 3d maps,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 10 925–10 932.
  17. J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 3, pp. 611–625, 2017.
  18. C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. Montiel, and J. D. Tardós, “Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,” IEEE Transactions on Robotics, vol. 37, no. 6, pp. 1874–1890, 2021.
  19. T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE Transactions on Robotics, vol. 34, no. 4, pp. 1004–1020, 2018.
  20. A. Jaegle, F. Gimeno, A. Brock, O. Vinyals, A. Zisserman, and J. Carreira, “Perceiver: General perception with iterative attention,” in International conference on machine learning.   PMLR, 2021, pp. 4651–4664.
  21. A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research, vol. 32, pp. 1231 – 1237, 2013.
  22. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, pp. 91–110, 2004.
  23. D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 224–236.
  24. J. Sivic and A. Zisserman, “Video google: A text retrieval approach to object matching in videos,” in Computer Vision, IEEE International Conference on, vol. 3.   IEEE Computer Society, 2003, pp. 1470–1470.
  25. H. Jégou, M. Douze, C. Schmid, and P. Pérez, “Aggregating local descriptors into a compact image representation,” in 2010 IEEE computer society conference on computer vision and pattern recognition.   IEEE, 2010, pp. 3304–3311.
  26. G. Berton, C. Masone, and B. Caputo, “Rethinking visual geo-localization for large-scale applications,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4878–4888.
  27. Y. Zhu, J. Wang, L. Xie, and L. Zheng, “Attention-based pyramid aggregation network for visual place recognition,” in Proceedings of the 26th ACM international conference on Multimedia, 2018, pp. 99–107.
  28. Y. Wang, H. Chen, J. Wang, and Y. Zhu, “Dmpcanet: A low dimensional aggregation network for visual place recognition,” in Proceedings of the 2022 International Conference on Multimedia Retrieval, 2022, pp. 24–28.
  29. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  30. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  31. S. Hausler, S. Garg, M. Xu, M. Milford, and T. Fischer, “Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 141–14 152.
  32. H. Jin Kim, E. Dunn, and J.-M. Frahm, “Learned contextual feature reweighting for image geo-localization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2136–2145.
  33. G. Peng, Y. Yue, J. Zhang, Z. Wu, X. Tang, and D. Wang, “Semantic reinforced attention learning for visual place recognition,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021, pp. 13 415–13 422.
  34. G. Kim and A. Kim, “Scan context: Egocentric spatial descriptor for place recognition within 3d point cloud map,” 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4802–4809, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:57755993
  35. G. Kim, S. Choi, and A. Kim, “Scan context++: Structural place recognition robust to rotation and lateral variations in urban environments,” IEEE Transactions on Robotics, vol. 38, pp. 1856–1874, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:238198272
  36. X. Xu, H. Yin, Z. Chen, Y. Li, Y. Wang, and R. Xiong, “Disco: Differentiable scan context with orientation,” IEEE Robotics and Automation Letters, vol. 6, pp. 2791–2798, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:224814531
  37. L. Luo, S. Zheng, Y. Li, Y. H. Fan, B. Yu, S. Cao, and H. Shen, “Bevplace: Learning lidar-based place recognition using bird’s eye view images,” ArXiv, vol. abs/2302.14325, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:257232932
  38. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660.
  39. Z. Liu, S. Zhou, C. Suo, P. Yin, W. Chen, H. Wang, H. Li, and Y.-H. Liu, “Lpd-net: 3d point cloud learning for large-scale place recognition and environment analysis,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 2831–2840.
  40. W. Zhang and C. Xiao, “Pcan: 3d attention map learning using contextual information for point cloud based retrieval,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12 436–12 445.
  41. Q. Sun, H. Liu, J. He, Z. Fan, and X. Du, “Dagc: Employing dual attention and graph convolution for point cloud based place recognition,” in Proceedings of the 2020 International Conference on Multimedia Retrieval, 2020, pp. 224–232.
  42. J. Komorowski, “Minkloc3d: Point cloud based large-scale place recognition,” in 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), 2021, pp. 1789–1798.
  43. ——, “Improving point cloud based place recognition with ranking-based loss and large batch training,” in 26th International Conference on Pattern Recognition, ICPR 2022, Montreal, QC, Canada, August 21-25, 2022.   IEEE, 2022, pp. 3699–3705. [Online]. Available: https://doi.org/10.1109/ICPR56361.2022.9956458
  44. J. Ma, J. Zhang, J. Xu, R. Ai, W. Gu, and X. Chen, “Overlaptransformer: An efficient and yaw-angle-invariant transformer network for lidar-based place recognition,” IEEE Robotics and Automation Letters, vol. 7, pp. 6958–6965, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:249223069
  45. R. W. Wolcott and R. M. Eustice, “Visual localization within lidar maps for automated urban driving,” in 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.   IEEE, 2014, pp. 176–183.
  46. D. Cattaneo, M. Vaghi, A. L. Ballardini, S. Fontana, D. G. Sorrenti, and W. Burgard, “Cmrnet: Camera to lidar-map registration,” in 2019 IEEE intelligent transportation systems conference (ITSC).   IEEE, 2019, pp. 1283–1289.
  47. T. Caselitz, B. Steder, M. Ruhnke, and W. Burgard, “Monocular camera localization in 3d lidar maps,” in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2016, pp. 1926–1931.
  48. Q. Li, J. Zhu, J. Liu, R. Cao, H. Fu, J. M. Garibaldi, Q. Li, B. Liu, and G. Qiu, “3d map-guided single indoor image localization refinement,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 161, pp. 13–26, 2020.
  49. M. Feng, S. Hu, M. H. Ang, and G. H. Lee, “2d3d-matchnet: Learning to match keypoints across 2d image and 3d point cloud,” in 2019 International Conference on Robotics and Automation (ICRA).   IEEE, 2019, pp. 4790–4796.
  50. Y. Zhong, “Intrinsic shape signatures: A shape descriptor for 3d object recognition,” in 2009 IEEE 12th international conference on computer vision workshops, ICCV workshops.   IEEE, 2009, pp. 689–696.
  51. Q.-H. Pham, M. A. Uy, B.-S. Hua, D. T. Nguyen, G. Roig, and S.-K. Yeung, “Lcd: Learned cross-domain descriptors for 2d-3d matching,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 11 856–11 864.
  52. L. Wiesmann, A. Milioto, X. Chen, C. Stachniss, and J. Behley, “Deep compression for dense point cloud maps,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2060–2067, 2021.
  53. H. Thomas, C. Qi, J.-E. Deschaud, B. Marcotegui, F. Goulette, and L. J. Guibas, “Kpconv: Flexible and deformable convolution for point clouds,” 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6410–6419, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:121328056
  54. W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 year, 1000 km: The oxford robotcar dataset,” The International Journal of Robotics Research, vol. 36, no. 1, pp. 3–15, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xudong Cai (13 papers)
  2. Yongcai Wang (28 papers)
  3. Zhe Huang (57 papers)
  4. Yu Shao (10 papers)
  5. Deying Li (25 papers)
Citations (4)

Summary

  • The paper introduces VOLoc, a method that efficiently queries compressed LiDAR maps using geometric similarity and an attention-based aggregation module.
  • It demonstrates reduced storage requirements and real-time capability by operating directly on compressed maps without decompression.
  • The study leverages transfer learning on large LiDAR datasets to enhance 6DoF pose estimation, outperforming conventional VPR methods on the KITTI dataset.

Visual Place Recognition in Compressed Lidar Maps with VOLoc

Introduction

Visual Place Recognition (VPR) is pivotal for several applications such as autonomous driving, augmented reality, and robotic navigation. Traditional VPR methods, which primarily rely on image-to-image querying, often suffer from low accuracy due to challenges like environmental changes in lighting or season. With the advancement of Lidar technologies, researchers started to explore image-to-Lidar and Lidar-to-Lidar place recognition to overcome these challenges. However, one significant obstacle is the vast storage required for city-scale Lidar maps, which necessitates compression that, in turn, complicates direct place recognition. Addressing this, the presented paper introduces VOLoc—a method that efficiently queries compressed Lidar maps using images by exploiting geometric similarity.

VOLoc Framework

VOLoc stands out by its ability to operate directly on compressed Lidar maps without the need for decompression. The framework encompasses two key phases:

  • Offline Phase: In this stage, VOLoc compresses Lidar maps employing a Geometry-Preserving Compressor (GPC). GPC optimizes storage by clustering and downsampling while preserving the maps' geometric structure. This process is vital for maintaining the possibility for accurate 6DoF pose estimation.
  • Online Phase: During this phase, the Geometric Recovery Module (GRM) reconstructs the local scene geometry around the camera in real-time by generating a Querying Point Cloud (QPC). The QPC, once compressed using GPC, is converted into a global descriptor by an attention-based aggregation module. This descriptor is then used to query the compressed Lidar map.

Notably, VOLoc incorporates transfer learning to enhance the aggregation network's accuracy and generality, utilizing a large Lidar point cloud dataset for pre-training and fine-tuning on VO-generated point clouds.

Empirical Evaluation

Extensive evaluations were conducted to assess VOLoc against existing VPR methods. Utilizing the KITTI dataset, the paper demonstrated VOLoc's commendable localization accuracy, outperforming or equating with state-of-the-art Lidar-to-Lidar place recognition methods. VOLoc achieved such results with notably reduced query sizes and map storage requirements, emphasizing its practicality for devices with limited storage or bandwidth capabilities.

Outcomes and Insights

The investigation into VOLoc reveals several key insights:

  • Geometric Similarity: Leveraging geometric similarity enables VOLoc to bridge the gap between images and compressed Lidar maps effectively.
  • Storage Efficiency: By operating on compressed Lidar maps, VOLoc offers a solution that significantly reduces storage requirements.
  • Real-time Capability: Despite the compression and decompression processes, VOLoc can facilitate real-time place recognition, making it viable for dynamic applications like autonomous driving or mobile navigation.
  • Transfer Learning Efficiency: The adoption of a transfer learning scheme enhances the network's ability to comprehend geometric features, thereby improving localization accuracy.

Future Directions

The promising results of VOLoc pave the way for further exploration in VPR. Immediate extensions could investigate the applicability to single-image queries, potentially expanding VOLoc's utility. Further, the paper underscores the importance of optimized geometric recovery and compression techniques, suggesting an avenue of research focused on enhancing these components for improved efficiency and accuracy.

Conclusion

VOLoc marks a significant advancement in the domain of visual place recognition by introducing an efficacious method to query compressed Lidar maps using real-time captured images. Through judicious use of geometric similarity, compression algorithms, and an attention-based aggregation module, VOLoc establishes a new benchmark for memory-efficient and accurate place recognition. As future work progresses in refining these techniques, VOLoc's foundational framework provides a robust starting point for evolving VPR methodologies to be more adept and efficient in handling the complexities of real-world navigation and mapping applications.

X Twitter Logo Streamline Icon: https://streamlinehq.com