Unifying Local and Global Multimodal Features for Place Recognition in Aliased and Low-Texture Environments (2403.13395v1)
Abstract: Perceptual aliasing and weak textures pose significant challenges to the task of place recognition, hindering the performance of Simultaneous Localization and Mapping (SLAM) systems. This paper presents a novel model, called UMF (standing for Unifying Local and Global Multimodal Features) that 1) leverages multi-modality by cross-attention blocks between vision and LiDAR features, and 2) includes a re-ranking stage that re-orders based on local feature matching the top-k candidates retrieved using a global representation. Our experiments, particularly on sequences captured on a planetary-analogous environment, show that UMF outperforms significantly previous baselines in those challenging aliased environments. Since our work aims to enhance the reliability of SLAM in all situations, we also explore its performance on the widely used RobotCar dataset, for broader applicability. Code and models are available at https://github.com/DLR-RM/UMF
- G. Bresson et al., “Simultaneous localization and mapping: A survey of current trends in autonomous driving,” IEEE Transactions on Intelligent Vehicles, vol. 2, no. 3, pp. 194–220, 2017.
- M. Burki et al., “Vizard: Reliable visual localization for autonomous vehicles in urban outdoor environments,” in IEEE Intelligent Vehicles Symposium (IV), 2019, pp. 1124–1130.
- S. Mascaro et al., “Towards automating construction tasks: Large-scale object mapping, segmentation, and manipulation,” Journal of Field Robotics, vol. 38, no. 5, pp. 684–699, 2021.
- X. Shu et al., “Slam in the field: An evaluation of monocular mapping and localization on challenging dynamic agricultural environment,” in IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 1761–1771.
- R. Oliveira et al., “Advances in agriculture robotics: A state-of-the-art review and challenges ahead,” Robotics, vol. 10, no. 2, p. 52, 2021.
- N. Gelfand et al., “Geometrically stable sampling for the icp algorithm,” in International Conference on 3-D Digital Imaging and Modeling (3DIM), 2003, pp. 260–267.
- A. Geiger et al., “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231–1237, 2013.
- J. Choi et al., “Kaist multi-spectral day/night data set for autonomous and assisted driving,” IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 3, pp. 934–948, 2018.
- S. Wenzel et al., “4seasons: A cross-season dataset for multi-weather slam in autonomous driving,” in DAGM German Conference on Pattern Recognition. Springer, 2020, pp. 404–417.
- J. Sturm et al., “A benchmark for the evaluation of rgb-d slam systems,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2012, pp. 573–580.
- R. Schubert et al., “The tum vi benchmark for evaluating visual-inertial odometry,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 1680–1687.
- T. Schops et al., “Bad slam: Bundle adjusted direct rgb-d slam,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- A. Handa et al., “A benchmark for rgb-d visual odometry, 3d reconstruction and slam,” in IEEE International Conference on Robotics and Automation (ICRA), 2014, pp. 1524–1531.
- R. Giubilato, W. Stürzl, A. Wedler, and R. Triebel, “Challenges of slam in extremely unstructured environments: The dlr planetary stereo, solid-state lidar, inertial dataset,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 8721–8728, 2022.
- B. Cao, A. Araujo, and J. Sim, “Unifying deep local and global features for image search,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16. Springer, 2020, pp. 726–743.
- B. Williams, M. Cummins, J. Neira, P. Newman, I. Reid, and J. Tardós, “A comparison of loop closing techniques in monocular slam,” Robotics and Autonomous Systems, vol. 57, no. 12, pp. 1188–1197, 2009.
- E. Garcia-Fidalgo and A. Ortiz, “Vision-based topological mapping and localization methods: A survey,” Robotics and Autonomous Systems, vol. 64, pp. 1–20, 2015.
- S. Lowry, N. Sünderhauf, P. Newman, J. J. Leonard, D. Cox, P. Corke, and M. J. Milford, “Visual place recognition: A survey,” ieee transactions on robotics, vol. 32, no. 1, pp. 1–19, 2015.
- X. Zhang, L. Wang, and Y. Su, “Visual place recognition: A survey from deep learning perspective,” Pattern Recognition, vol. 113, p. 107760, 2021.
- C. Masone and B. Caputo, “A survey on deep visual place recognition,” IEEE Access, vol. 9, pp. 19 516–19 547, 2021.
- S. Garg, T. Fischer, and M. Milford, “Where is your place, visual place recognition?” arXiv preprint arXiv:2103.06443, 2021.
- S. Schubert, P. Neubert, S. Garg, M. Milford, and T. Fischer, “Visual place recognition: A tutorial,” arXiv preprint arXiv:2303.03281, 2023.
- D. Gálvez-López and J. D. Tardós, “Bags of binary words for fast place recognition in image sequences,” IEEE Transactions on Robotics, vol. 28, no. 5, pp. 1188–1197, October 2012.
- C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. M. Montiel, and J. D. Tardós, “ORB-SLAM3: An accurate Open-Source library for visual, Visual–Inertial, and multimap SLAM,” IEEE Trans. Rob., pp. 1–17, 2021.
- R. Arandjelović, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “NetVLAD: CNN architecture for weakly supervised place recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016.
- C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” arXiv preprint arXiv:1706.02413, 2017.
- Y. Zhu, J. Wang, L. Xie, and L. Zheng, “Attention-based pyramid aggregation network for visual place recognition,” 2018.
- P. Weinzaepfel, T. Lucas, D. Larlus, and Y. Kalantidis, “Learning Super-Features for image retrieval,” in International Conference on Learning Representations, 2022.
- C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” arXiv preprint arXiv:1612.00593, 2016.
- M. A. Uy and G. H. Lee, “Pointnetvlad: Deep point cloud based retrieval for large-scale place recognition,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- T.-X. Xu, Y.-C. Guo, Y.-K. Lai, and S.-H. Zhang, “TransLoc3D: Point cloud based large-scale place recognition using adaptive receptive fields,” May 2021.
- J. Ma, J. Zhang, J. Xu, R. Ai, W. Gu, and X. Chen, “OverlapTransformer: An efficient and Yaw-Angle-Invariant transformer network for LiDAR-Based place recognition,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 6958–6965, July 2022.
- J. Komorowski, M. Wysoczańska, and T. Trzcinski, “MinkLoc++: Lidar and monocular image fusion for place recognition,” in 2021 International Joint Conference on Neural Networks (IJCNN), July 2021, pp. 1–8.
- H. Lai, P. Yin, and S. Scherer, “AdaFusion: Visual-LiDAR fusion with adaptive weights for place recognition,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 12 038–12 045, 2022.
- M.-H. Guo, C.-Z. Lu, Z.-N. Liu, M.-M. Cheng, and S.-M. Hu, “Visual attention network,” 2022.
- R. Wang, Y. Shen, W. Zuo, S. Zhou, and N. Zheng, “TransVPR: Transformer-based place recognition with multi-level attention aggregation,” Jan. 2022.
- S. Hausler, S. Garg, M. Xu, M. Milford, and T. Fischer, “Patch-NetVLAD: Multi-Scale fusion of Locally-Global descriptors for place recognition,” 2021.
- Y. Zheng, T. Birdal, F. Xia, Y. Yang, Y. Duan, and L. J. Guibas, “6d camera relocalization in visually ambiguous extreme environments,” arXiv preprint arXiv:2207.06333, 2022.
- L. Meyer, M. Smíšek, A. Fontan Villacampa, L. Oliva Maza, D. Medina, M. J. Schuster, F. Steidle, M. Vayugundla, M. G. Müller, B. Rebele, et al., “The madmax data set for visual-inertial rover navigation on mars,” Journal of Field Robotics, vol. 38, no. 6, pp. 833–853, 2021.
- T. Chen, S. Kornblith, K. Swersky, M. Norouzi, and G. Hinton, “Big self-supervised models are strong semi-supervised learners,” arXiv preprint arXiv:2006.10029, 2020.
- J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. H. Richemond, E. Buchatskaya, C. Doersch, B. A. Pires, Z. D. Guo, M. G. Azar, B. Piot, K. Kavukcuoglu, R. Munos, and M. Valko, “Bootstrap your own latent: A new approach to self-supervised learning,” 2020.
- S. Woo, S. Debnath, R. Hu, X. Chen, Z. Liu, I. S. Kweon, and S. Xie, “ConvNeXt v2: Co-designing and scaling ConvNets with masked autoencoders,” Jan. 2023.
- K. Tian, Y. Jiang, Q. Diao, C. Lin, L. Wang, and Z. Yuan, “Designing BERT for convolutional networks: Sparse and hierarchical masked modeling,” Jan. 2023.
- C. Min, X. Xu, D. Zhao, L. Xiao, Y. Nie, and B. Dai, “Occupancy-mae: Self-supervised pre-training large-scale lidar point clouds with masked occupancy autoencoders,” arXiv e-prints, 2022.
- N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. Springer, 2020, pp. 213–229.
- D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, pp. 91–110, 2004.
- G. Hess, J. Jaxing, E. Svensson, D. Hagerman, C. Petersson, and L. Svensson, “Masked autoencoder for self-supervised pre-training on lidar point clouds,” in 2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW). IEEE, jan 2023. [Online]. Available: https://doi.org/10.1109
- D. Barnes, M. Gadd, P. Murcutt, P. Newman, and I. Posner, “The oxford radar robotcar dataset: A radar extension to the oxford robotcar dataset,” in 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 6433–6438.
- S. Lacroix, A. De Maio, Q. Labourey, E. P. Mendes, P. Narvor, V. Bissonette, C. Bazerque, F. Souvannavong, R. Viards, and M. Azkarate, “The erfoud dataset: a comprehensive multi-camera and lidar data collection for planetary exploration,” in 15th Symposium on Advanced Space Technologies in Robotics and Automation, 2020.
- M. Vayugundla, F. Steidle, M. Smisek, M. J. Schuster, K. Bussmann, and A. Wedler, “Datasets of long range navigation experiments in a moon analogue environment on mount etna,” in ISR 2018; 50th International Symposium on Robotics, 2018, pp. 1–7.
- M. G. Müller, M. Durner, A. Gawel, W. Stürzl, R. Triebel, and R. Siegwart, “A Photorealistic Terrain Simulation Pipeline for Unstructured Outdoor Environments,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021.
- F. A. research lab, “Pytorch,” https://pytorch.org/, (date accessed 11-3-2023).
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.