Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BEVRender: Vision-based Cross-view Vehicle Registration in Off-road GNSS-denied Environment (2405.09001v2)

Published 14 May 2024 in cs.RO

Abstract: We introduce BEVRender, a novel learning based approach for the localization of ground vehicles in Global Navigation Satellite System(GNSS)-denied off-road scenarios. These environments are typically challenging for conventional vision-based state estimation due to the lack of distinct visual landmarks and the instability of vehicle poses. To address this, BEVRender generates high-quality local bird's-eye-view(BEV) images of the local terrain. Subsequently, these images are aligned with a geo referenced aerial map through template matching to achieve accurate cross-view registration. Our approach overcomes the inherent limitations of visual inertial odometry systems and the substantial storage requirements of image-retrieval localization strategies, which are susceptible to drift and scalability issues, respectively. Extensive experimentation validates BEVRender's advancement over existing GNSS-denied visual localization methods, demonstrating notable enhancements in both localization accuracy and update frequency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Y. Alkendi, L. Seneviratne, and Y. Zweiri, “State of the art in vision-based localization techniques for autonomous navigation systems,” IEEE Access, vol. 9, pp. 76 847–76 874, 2021.
  2. Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Y. Qiao, and J. Dai, “Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers,” in European Conf. on Computer Vision (ECCV).   Springer, 2022, pp. 1–18.
  3. Y. Litman, D. McGann, E. Dexheimer, and M. Kaess, “Gps-denied global visual-inertial ground vehicle state estimation via image registration,” in IEEE Intl. Conf. on Robotics and Automation (ICRA), 2022, pp. 8178–8184.
  4. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” Intl. Conf. on Learning Representations (ICLR), 2021.
  5. J. Wolf, W. Burgard, and H. Burkhardt, “Robust vision-based localization by combining an image-retrieval system with monte carlo localization,” IEEE Trans. Robotics, vol. 21, no. 2, pp. 208–216, 2005.
  6. P.-E. Sarlin, E. Trulls, M. Pollefeys, J. Hosang, and S. Lynen, “SNAP: Self-Supervised Neural Maps for Visual Positioning and Semantic Understanding,” in Advances in Neural Information Processing Systems (NeurIPS), 2023.
  7. X. Zhang, X. Li, W. Sultani, Y. Zhou, and S. Wshah, “Cross-view geo-localization via learning disentangled geometric layout correspondence,” in AAAI Conf. on Artificial Intelligence, vol. 37, no. 3, 2023, pp. 3480–3488.
  8. Z. Xia, X. Pan, S. Song, L. E. Li, and G. Huang, “Vision transformer with deformable attention,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 4794–4803.
  9. W. Hess, D. Kohler, H. Rapp, and D. Andor, “Real-time loop closure in 2d lidar slam,” in IEEE Intl. Conf. on Robotics and Automation (ICRA).   IEEE, 2016, pp. 1271–1278.
  10. F. Poggenhans, J.-H. Pauls, J. Janosovits, S. Orf, M. Naumann, F. Kuhnt, and M. Mayr, “Lanelet2: A high-definition map framework for the future of automated driving,” in IEEE Intl. Conf. on intelligent transportation systems (ITSC).   IEEE, 2018, pp. 1672–1679.
  11. X. Wan, Y. Shao, S. Zhang, and S. Li, “Terrain aided planetary uav localization based on geo-referencing,” IEEE Trans. on Geoscience and Remote Sensing, vol. 60, pp. 1–18, 2022.
  12. G. Kuppudurai, K.-y. Hwang, H.-G. Park, and Y. Kim, “Localization of airborne platform using digital elevation model with adaptive weighting inspired by information theory,” IEEE Sensors Journal, vol. 18, no. 18, pp. 7585–7592, 2018.
  13. P.-E. Sarlin, D. DeTone, T.-Y. Yang, A. Avetisyan, J. Straub, T. Malisiewicz, S. R. Bulo, R. Newcombe, P. Kontschieder, and V. Balntas, “OrienterNet: Visual Localization in 2D Public Maps with Neural Matching,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), 2023.
  14. A. Viswanathan, B. R. Pires, and D. Huber, “Vision based robot localization by ground to satellite matching in gps-denied situations,” in IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2014, pp. 192–198.
  15. F. Dellaert and M. Kaess, “Factor graphs for robot perception,” Foundations and Trends in Robotics (FNT), vol. 6, no. 1-2, pp. 1–139, 2017.
  16. A. B. Camiletto, A. Bochicchio, A. Liniger, D. Dai, and A. Gawel, “U-bev: Height-aware bird’s-eye-view segmentation and neural map-based relocalization,” 2023.
  17. F. Fervers, S. Bullinger, C. Bodensteiner, M. Arens, and R. Stiefelhagen, “C-bev: Contrastive bird’s eye view training for cross-view image retrieval and 3-dof pose estimation,” 2023.
  18. Z. Zhang, M. Xu, W. Zhou, T. Peng, L. Li, and S. Poslad, “Bev-locator: An end-to-end visual semantic localization network using multi-view images,” 2022.
  19. M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in Proc. IEEE/CVF Intl. Conf. on Computer Vision (ICCV), October 2021, pp. 9650–9660.
  20. M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P.-Y. Huang, S.-W. Li, I. Misra, M. Rabbat, V. Sharma, G. Synnaeve, H. Xu, H. Jegou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “Dinov2: Learning robust visual features without supervision,” 2024.
  21. N. Keetha, A. Mishra, J. Karhade, K. M. Jatavallabhula, S. Scherer, M. Krishna, and S. Garg, “Anyloc: Towards universal visual place recognition,” 2023.
  22. Y. He, I. Cisneros, N. Keetha, J. Patrikar, Z. Ye, I. Higgins, Y. Hu, P. Kapoor, and S. Scherer, “Foundloc: Vision-based onboard aerial localization in the wild,” arXiv preprint arXiv:2310.16299, 2023.
  23. Y. B. Can, A. Liniger, D. P. Paudel, and L. Van Gool, “Structured bird’s-eye-view traffic scene understanding from onboard images,” in Proc. IEEE/CVF Intl. Conf. on Computer Vision (ICCV), October 2021, pp. 15 661–15 670.
  24. H. Li, C. Sima, J. Dai, W. Wang, L. Lu, H. Wang, J. Zeng, Z. Li, J. Yang, H. Deng, H. Tian, E. Xie, J. Xie, L. Chen, T. Li, Y. Li, Y. Gao, X. Jia, S. Liu, J. Shi, D. Lin, and Y. Qiao, “Delving into the devils of bird’s-eye-view perception: A review, evaluation and recipe,” IEEE Trans. Pattern Anal. Machine Intell., vol. 46, no. 4, pp. 2151–2170, 2024.
  25. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017.
  26. C. Hu, H. Zheng, K. Li, J. Xu, W. Mao, M. Luo, L. Wang, M. Chen, Q. Peng, K. Liu, Y. Zhao, P. Hao, M. Liu, and K. Yu, “Fusionformer: A multi-sensory fusion in bird’s-eye-view and temporal consistent transformer for 3d object detection,” 2023.
  27. A. Saha, O. Mendez, C. Russell, and R. Bowden, “Enabling spatio-temporal aggregation in birds-eye-view vehicle estimation,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 5133–5139.
  28. Z. Qin, J. Chen, C. Chen, X. Chen, and X. Li, “Unifusion: Unified multi-view fusion transformer for spatial-temporal representation in bird’s-eye-view,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 8690–8699.
  29. H. Cai, Z. Zhang, Z. Zhou, Z. Li, W. Ding, and J. Zhao, “Bevfusion4d: Learning lidar-camera fusion under bird’s-eye-view via cross-modality guidance and temporal aggregation,” 2023.
  30. A. K. Akan and F. Güney, “Stretchbev: Stretching future instance prediction spatially and temporally,” in Computer Vision – ECCV 2022, S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, and T. Hassner, Eds.   Cham: Springer Nature Switzerland, 2022, pp. 444–460.
  31. X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,” arXiv preprint arXiv:2010.04159, 2020.
  32. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proc. IEEE/CVF Intl. Conf. on Computer Vision (ICCV), 2021.
  33. W. Peebles and S. Xie, “Scalable diffusion models with transformers,” in Proc. IEEE/CVF Intl. Conf. on Computer Vision (ICCV), 2023, pp. 4195–4205.
Citations (1)

Summary

We haven't generated a summary for this paper yet.