BEVRender: Vision-based Cross-view Vehicle Registration in Off-road GNSS-denied Environment (2405.09001v2)
Abstract: We introduce BEVRender, a novel learning based approach for the localization of ground vehicles in Global Navigation Satellite System(GNSS)-denied off-road scenarios. These environments are typically challenging for conventional vision-based state estimation due to the lack of distinct visual landmarks and the instability of vehicle poses. To address this, BEVRender generates high-quality local bird's-eye-view(BEV) images of the local terrain. Subsequently, these images are aligned with a geo referenced aerial map through template matching to achieve accurate cross-view registration. Our approach overcomes the inherent limitations of visual inertial odometry systems and the substantial storage requirements of image-retrieval localization strategies, which are susceptible to drift and scalability issues, respectively. Extensive experimentation validates BEVRender's advancement over existing GNSS-denied visual localization methods, demonstrating notable enhancements in both localization accuracy and update frequency.
- Y. Alkendi, L. Seneviratne, and Y. Zweiri, “State of the art in vision-based localization techniques for autonomous navigation systems,” IEEE Access, vol. 9, pp. 76 847–76 874, 2021.
- Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Y. Qiao, and J. Dai, “Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers,” in European Conf. on Computer Vision (ECCV). Springer, 2022, pp. 1–18.
- Y. Litman, D. McGann, E. Dexheimer, and M. Kaess, “Gps-denied global visual-inertial ground vehicle state estimation via image registration,” in IEEE Intl. Conf. on Robotics and Automation (ICRA), 2022, pp. 8178–8184.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” Intl. Conf. on Learning Representations (ICLR), 2021.
- J. Wolf, W. Burgard, and H. Burkhardt, “Robust vision-based localization by combining an image-retrieval system with monte carlo localization,” IEEE Trans. Robotics, vol. 21, no. 2, pp. 208–216, 2005.
- P.-E. Sarlin, E. Trulls, M. Pollefeys, J. Hosang, and S. Lynen, “SNAP: Self-Supervised Neural Maps for Visual Positioning and Semantic Understanding,” in Advances in Neural Information Processing Systems (NeurIPS), 2023.
- X. Zhang, X. Li, W. Sultani, Y. Zhou, and S. Wshah, “Cross-view geo-localization via learning disentangled geometric layout correspondence,” in AAAI Conf. on Artificial Intelligence, vol. 37, no. 3, 2023, pp. 3480–3488.
- Z. Xia, X. Pan, S. Song, L. E. Li, and G. Huang, “Vision transformer with deformable attention,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 4794–4803.
- W. Hess, D. Kohler, H. Rapp, and D. Andor, “Real-time loop closure in 2d lidar slam,” in IEEE Intl. Conf. on Robotics and Automation (ICRA). IEEE, 2016, pp. 1271–1278.
- F. Poggenhans, J.-H. Pauls, J. Janosovits, S. Orf, M. Naumann, F. Kuhnt, and M. Mayr, “Lanelet2: A high-definition map framework for the future of automated driving,” in IEEE Intl. Conf. on intelligent transportation systems (ITSC). IEEE, 2018, pp. 1672–1679.
- X. Wan, Y. Shao, S. Zhang, and S. Li, “Terrain aided planetary uav localization based on geo-referencing,” IEEE Trans. on Geoscience and Remote Sensing, vol. 60, pp. 1–18, 2022.
- G. Kuppudurai, K.-y. Hwang, H.-G. Park, and Y. Kim, “Localization of airborne platform using digital elevation model with adaptive weighting inspired by information theory,” IEEE Sensors Journal, vol. 18, no. 18, pp. 7585–7592, 2018.
- P.-E. Sarlin, D. DeTone, T.-Y. Yang, A. Avetisyan, J. Straub, T. Malisiewicz, S. R. Bulo, R. Newcombe, P. Kontschieder, and V. Balntas, “OrienterNet: Visual Localization in 2D Public Maps with Neural Matching,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), 2023.
- A. Viswanathan, B. R. Pires, and D. Huber, “Vision based robot localization by ground to satellite matching in gps-denied situations,” in IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2014, pp. 192–198.
- F. Dellaert and M. Kaess, “Factor graphs for robot perception,” Foundations and Trends in Robotics (FNT), vol. 6, no. 1-2, pp. 1–139, 2017.
- A. B. Camiletto, A. Bochicchio, A. Liniger, D. Dai, and A. Gawel, “U-bev: Height-aware bird’s-eye-view segmentation and neural map-based relocalization,” 2023.
- F. Fervers, S. Bullinger, C. Bodensteiner, M. Arens, and R. Stiefelhagen, “C-bev: Contrastive bird’s eye view training for cross-view image retrieval and 3-dof pose estimation,” 2023.
- Z. Zhang, M. Xu, W. Zhou, T. Peng, L. Li, and S. Poslad, “Bev-locator: An end-to-end visual semantic localization network using multi-view images,” 2022.
- M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in Proc. IEEE/CVF Intl. Conf. on Computer Vision (ICCV), October 2021, pp. 9650–9660.
- M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P.-Y. Huang, S.-W. Li, I. Misra, M. Rabbat, V. Sharma, G. Synnaeve, H. Xu, H. Jegou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “Dinov2: Learning robust visual features without supervision,” 2024.
- N. Keetha, A. Mishra, J. Karhade, K. M. Jatavallabhula, S. Scherer, M. Krishna, and S. Garg, “Anyloc: Towards universal visual place recognition,” 2023.
- Y. He, I. Cisneros, N. Keetha, J. Patrikar, Z. Ye, I. Higgins, Y. Hu, P. Kapoor, and S. Scherer, “Foundloc: Vision-based onboard aerial localization in the wild,” arXiv preprint arXiv:2310.16299, 2023.
- Y. B. Can, A. Liniger, D. P. Paudel, and L. Van Gool, “Structured bird’s-eye-view traffic scene understanding from onboard images,” in Proc. IEEE/CVF Intl. Conf. on Computer Vision (ICCV), October 2021, pp. 15 661–15 670.
- H. Li, C. Sima, J. Dai, W. Wang, L. Lu, H. Wang, J. Zeng, Z. Li, J. Yang, H. Deng, H. Tian, E. Xie, J. Xie, L. Chen, T. Li, Y. Li, Y. Gao, X. Jia, S. Liu, J. Shi, D. Lin, and Y. Qiao, “Delving into the devils of bird’s-eye-view perception: A review, evaluation and recipe,” IEEE Trans. Pattern Anal. Machine Intell., vol. 46, no. 4, pp. 2151–2170, 2024.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017.
- C. Hu, H. Zheng, K. Li, J. Xu, W. Mao, M. Luo, L. Wang, M. Chen, Q. Peng, K. Liu, Y. Zhao, P. Hao, M. Liu, and K. Yu, “Fusionformer: A multi-sensory fusion in bird’s-eye-view and temporal consistent transformer for 3d object detection,” 2023.
- A. Saha, O. Mendez, C. Russell, and R. Bowden, “Enabling spatio-temporal aggregation in birds-eye-view vehicle estimation,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 5133–5139.
- Z. Qin, J. Chen, C. Chen, X. Chen, and X. Li, “Unifusion: Unified multi-view fusion transformer for spatial-temporal representation in bird’s-eye-view,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 8690–8699.
- H. Cai, Z. Zhang, Z. Zhou, Z. Li, W. Ding, and J. Zhao, “Bevfusion4d: Learning lidar-camera fusion under bird’s-eye-view via cross-modality guidance and temporal aggregation,” 2023.
- A. K. Akan and F. Güney, “Stretchbev: Stretching future instance prediction spatially and temporally,” in Computer Vision – ECCV 2022, S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, and T. Hassner, Eds. Cham: Springer Nature Switzerland, 2022, pp. 444–460.
- X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,” arXiv preprint arXiv:2010.04159, 2020.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proc. IEEE/CVF Intl. Conf. on Computer Vision (ICCV), 2021.
- W. Peebles and S. Xie, “Scalable diffusion models with transformers,” in Proc. IEEE/CVF Intl. Conf. on Computer Vision (ICCV), 2023, pp. 4195–4205.