Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LCPR: A Multi-Scale Attention-Based LiDAR-Camera Fusion Network for Place Recognition (2311.03198v2)

Published 6 Nov 2023 in cs.CV and cs.RO

Abstract: Place recognition is one of the most crucial modules for autonomous vehicles to identify places that were previously visited in GPS-invalid environments. Sensor fusion is considered an effective method to overcome the weaknesses of individual sensors. In recent years, multimodal place recognition fusing information from multiple sensors has gathered increasing attention. However, most existing multimodal place recognition methods only use limited field-of-view camera images, which leads to an imbalance between features from different modalities and limits the effectiveness of sensor fusion. In this paper, we present a novel neural network named LCPR for robust multimodal place recognition, which fuses LiDAR point clouds with multi-view RGB images to generate discriminative and yaw-rotation invariant representations of the environment. A multi-scale attention-based fusion module is proposed to fully exploit the panoramic views from different modalities of the environment and their correlations. We evaluate our method on the nuScenes dataset, and the experimental results show that our method can effectively utilize multi-view camera and LiDAR data to improve the place recognition performance while maintaining strong robustness to viewpoint changes. Our open-source code and pre-trained models are available at https://github.com/ZhouZijie77/LCPR .

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. P. Yin, S. Zhao, I. Cisneros, A. Abuduweili, G. Huang, M. Milford, C. Liu, H. Choset, and S. Scherer, “General place recognition survey: Towards the real-world autonomy age,” arXiv preprint arXiv:2209.04497, 2022.
  2. J. Ma, J. Zhang, J. Xu, R. Ai, W. Gu, and X. Chen, “Overlaptransformer: An efficient and yaw-angle-invariant transformer network for lidar-based place recognition,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 6958–6965, 2022.
  3. H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded up robust features,” Lecture notes in computer science, vol. 3951, pp. 404–417, 2006.
  4. G. Tang, Z. Liu, and J. Xiong, “Distinctive image features from illumination and scale invariant keypoints,” Multimedia Tools and Applications, vol. 78, pp. 23 415–23 442, 2019.
  5. A. Torii, J. Sivic, T. Pajdla, and M. Okutomi, “Visual place recognition with repetitive structures,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 883–890.
  6. R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “Netvlad: Cnn architecture for weakly supervised place recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5297–5307.
  7. S. Hausler, S. Garg, M. Xu, M. Milford, and T. Fischer, “Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 141–14 152.
  8. A. Khaliq, S. Ehsan, M. Milford, and K. McDonald-Maier, “Camal: Context-aware multi-layer attention framework for lightweight environment invariant visual place recognition,” arXiv preprint arXiv:1909.08153, 2020.
  9. G. Kim and A. Kim, “Scan context: Egocentric spatial descriptor for place recognition within 3d point cloud map,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2018, pp. 4802–4809.
  10. M. A. Uy and G. H. Lee, “Pointnetvlad: Deep point cloud based retrieval for large-scale place recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4470–4479.
  11. Z. Liu, S. Zhou, C. Suo, P. Yin, W. Chen, H. Wang, H. Li, and Y.-H. Liu, “Lpd-net: 3d point cloud learning for large-scale place recognition and environment analysis,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 2831–2840.
  12. H. Lai, P. Yin, and S. Scherer, “Adafusion: Visual-lidar fusion with adaptive weights for place recognition,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 12 038–12 045, 2022.
  13. J. Komorowski, M. Wysoczańska, and T. Trzcinski, “Minkloc++: lidar and monocular image fusion for place recognition,” in 2021 International Joint Conference on Neural Networks (IJCNN).   IEEE, 2021, pp. 1–8.
  14. Y. Lu, F. Yang, F. Chen, and D. Xie, “Pic-net: Point cloud and image collaboration network for large-scale place recognition,” arXiv preprint arXiv:2008.00658, 2020.
  15. Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. L. Rus, and S. Han, “Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 2774–2781.
  16. A. Prakash, K. Chitta, and A. Geiger, “Multi-modal fusion transformer for end-to-end autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7077–7087.
  17. H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631.
  18. H. Andreasson and T. Duckett, “Topological localization for mobile robots using omni-directional vision and local features,” IFAC Proceedings Volumes, vol. 37, no. 8, pp. 36–41, 2004.
  19. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, pp. 91–110, 2004.
  20. W. Zhang and J. Kosecka, “Image based localization in urban environments,” in Third international symposium on 3D data processing, visualization, and transmission (3DPVT’06).   IEEE, 2006, pp. 33–40.
  21. H. Jégou, M. Douze, C. Schmid, and P. Pérez, “Aggregating local descriptors into a compact image representation,” in 2010 IEEE computer society conference on computer vision and pattern recognition.   IEEE, 2010, pp. 3304–3311.
  22. Z. Chen, O. Lam, A. Jacobson, and M. Milford, “Convolutional neural network-based place recognition,” arXiv preprint arXiv:1411.1509, 2014.
  23. F. Radenović, G. Tolias, and O. Chum, “Fine-tuning cnn image retrieval with no human annotation,” IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 7, pp. 1655–1668, 2018.
  24. B. Arcanjo, B. Ferrarini, M. Milford, K. D. McDonald-Maier, and S. Ehsan, “An efficient and scalable collection of fly-inspired voting units for visual place recognition in changing environments,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2527–2534, 2022.
  25. L. He, X. Wang, and H. Zhang, “M2dp: A novel 3d point cloud descriptor and its application in loop closure detection,” in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2016, pp. 231–237.
  26. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660.
  27. W. Zhang and C. Xiao, “Pcan: 3d attention map learning using contextual information for point cloud based retrieval,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12 436–12 445.
  28. J. Komorowski, “Minkloc3d: Point cloud based large-scale place recognition,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 1790–1799.
  29. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  30. S. Xie, C. Pan, Y. Peng, K. Liu, and S. Ying, “Large-scale place recognition based on camera-lidar fused descriptor,” Sensors, vol. 20, no. 10, p. 2870, 2020.
  31. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  32. Y. Pan, X. Xu, W. Li, Y. Cui, Y. Wang, and R. Xiong, “Coral: Colored structural representation for bi-modal place recognition,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2021, pp. 2084–2091.
  33. X. Yu, B. Zhou, Z. Chang, K. Qian, and F. Fang, “Mmdf: Multi-modal deep feature based place recognition of mobile robots with applications on cross-scene navigation,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 6742–6749, 2022.
  34. S. Vora, A. H. Lang, B. Helou, and O. Beijbom, “Pointpainting: Sequential fusion for 3d object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 4604–4612.
  35. H. Zhou, Z. Ge, Z. Li, and X. Zhang, “Matrixvt: Efficient multi-camera to bev transformation for 3d perception,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8548–8557.
  36. J. Ma, X. Chen, J. Xu, and G. Xiong, “Seqot: A spatial-temporal transformer network for place recognition using sequential lidar data,” IEEE Transactions on Industrial Electronics, 2022.
  37. J. Ma, G. Xiong, J. Xu, and X. Chen, “Cvtnet: A cross-view transformer network for lidar-based place recognition in autonomous driving environments,” IEEE Transactions on Industrial Informatics, 2023.
  38. K. Cait, B. Wang, and C. X. Lu, “Autoplace: Robust place recognition with single-chip automotive radar,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 2222–2228.
  39. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition.   Ieee, 2009, pp. 248–255.
  40. J. Johnson, M. Douze, and H. Jégou, “Billion-scale similarity search with gpus,” IEEE Transactions on Big Data, vol. 7, no. 3, pp. 535–547, 2019.
Citations (6)

Summary

We haven't generated a summary for this paper yet.