Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

U-BEV: Height-aware Bird's-Eye-View Segmentation and Neural Map-based Relocalization (2310.13766v3)

Published 20 Oct 2023 in cs.CV

Abstract: Efficient relocalization is essential for intelligent vehicles when GPS reception is insufficient or sensor-based localization fails. Recent advances in Bird's-Eye-View (BEV) segmentation allow for accurate estimation of local scene appearance and in turn, can benefit the relocalization of the vehicle. However, one downside of BEV methods is the heavy computation required to leverage the geometric constraints. This paper presents U-BEV, a U-Net inspired architecture that extends the current state-of-the-art by allowing the BEV to reason about the scene on multiple height layers before flattening the BEV features. We show that this extension boosts the performance of the U-BEV by up to 4.11 IoU. Additionally, we combine the encoded neural BEV with a differentiable template matcher to perform relocalization on neural SD-map data. The model is fully end-to-end trainable and outperforms transformer-based BEV methods of similar computational complexity by 1.7 to 2.8 mIoU and BEV-based relocalization by over 26% Recall Accuracy on the nuScenes dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. “Visual place recognition: A survey” In IEEE Transactions on Robotics 32.1 IEEE, 2015, pp. 1–19
  2. Sourav Garg, Tobias Fischer and Michael Milford “Where is your place, visual place recognition?” In International Joint Conference on Artificial Intelligence (IJCAI2021), 2021
  3. “M22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTBEV: Multi-camera joint 3D detection and segmentation with unified birds-eye view representation” In ArXiv Preprint, 2022 arXiv:2204.05088
  4. “BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View Representation” In IEEE International Conference on Robotics and Automation (ICRA), 2023
  5. Yunpeng Zhang, Zheng Zhu and Dalong Du “OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction” In arXiv preprint arXiv:2304.05316, 2023
  6. “SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving” In arXiv preprint arXiv:2303.09551, 2023
  7. “Visual Mapping and Localization System Based on Compact Instance-Level Road Markings With Spatial Uncertainty” In IEEE Robotics and Automation Letters 7.4 IEEE, 2022, pp. 10802–10809
  8. “BEV-Locator: An End-to-end Visual Semantic Localization Network Using Multi-View Images” In arXiv preprint arXiv:2211.14927, 2022
  9. “NeMO: Neural Map Growing System for Spatiotemporal Fusion in Bird’s-Eye-View and BDD-Map Benchmark” In arXiv preprint arXiv:2306.04540, 2023
  10. “Cross-view Transformers for real-time Map-view Semantic Segmentation” In CVPR, 2022
  11. “Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D” In Proceedings of the European Conference on Computer Vision, 2020
  12. “Predicting semantic map representations from images using pyramid occupancy networks” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11138–11147
  13. “Understanding bird’s-eye view of road semantics using an onboard camera” In IEEE Robotics and Automation Letters 7.2 IEEE, 2022, pp. 3302–3309
  14. “Predicting Semantic Map Representations from Images using Pyramid Occupancy Networks” In CoRR abs/2003.13402, 2020
  15. “OrienterNet: Visual Localization in 2D Public Maps with Neural Matching” In CVPR, 2023
  16. “SNAP: Self-Supervised Neural Maps for Visual Positioning and Semantic Understanding” In arXiv preprint arXiv:2306.05407, 2023
  17. “Pointpainting: Sequential fusion for 3d object detection” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 4604–4612
  18. Tianwei Yin, Xingyi Zhou and Philipp Krähenbühl “Multimodal virtual point 3d detection” In Advances in Neural Information Processing Systems 34, 2021, pp. 16494–16507
  19. “Cross-View Semantic Segmentation for Sensing Surroundings” In IEEE Robotics and Automation Letters 5, 2019, pp. 4867–4873
  20. “FIERY: Future Instance Prediction in Bird’s-Eye View from Surround Monocular Cameras” In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 15253–15262
  21. “Enabling spatio-temporal aggregation in Birds-Eye-View Vehicle Estimation” In 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 5133–5139
  22. Nikhil Bharadwaj Gosala and Abhinav Valada “Bird’s-Eye-View Panoptic Segmentation Using Monocular Frontal View Images” In IEEE Robotics and Automation Letters PP, 2021, pp. 1–1
  23. “BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View” In ArXiv abs/2112.11790, 2021
  24. “NetVLAD: CNN architecture for weakly supervised place recognition” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5297–5307
  25. “LTSR: Long-term Semantic Relocalization based on HD Map for Autonomous Vehicles” In 2022 International Conference on Robotics and Automation (ICRA), 2022, pp. 2171–2178 IEEE
  26. “X-view: Graph-based semantic multi-view localization” In IEEE Robotics and Automation Letters 3.3 IEEE, 2018, pp. 1687–1694
  27. “Coming down to earth: Satellite-to-street view synthesis for geo-localization” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 6488–6497
  28. “Geometry-aware satellite-to-ground image synthesis for urban areas” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 859–867
  29. “Uncertainty-aware Vision-based Metric Cross-view Geolocalization” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 21621–21631
  30. “Satellite image based cross-view localization for autonomous vehicle” In 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 3592–3599 IEEE
  31. “Visual cross-view metric localization with dense uncertainty estimates” In European Conference on Computer Vision, 2022, pp. 90–106 Springer
  32. “SliceMatch: Geometry-guided Aggregation for Cross-View Pose Estimation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 17225–17234
  33. “Co-Visual Pattern-Augmented Generative Transformer Learning for Automobile Geo-Localization” In Remote Sensing 15.9 MDPI, 2023, pp. 2221
  34. “Joint representation learning and keypoint detection for cross-view geo-localization” In IEEE Transactions on Image Processing 31 IEEE, 2022, pp. 3780–3792
  35. Hongji Yang, Xiufan Lu and Yingying Zhu “Cross-view geo-localization with layer-to-layer transformer” In Advances in Neural Information Processing Systems 34, 2021, pp. 29009–29020
  36. Olaf Ronneberger, Philipp Fischer and Thomas Brox “U-Net: Convolutional Networks for Biomedical Image Segmentation” In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015 abs/1505.04597 Springer International Publishing, 2015
  37. “nuscenes: A multimodal dataset for autonomous driving” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11621–11631
  38. Mingxing Tan and Quoc V. Le “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks” In ICML 2019 abs/1905.11946, 2019
  39. “Pyramid Attention Network for Semantic Segmentation” In ArXiv abs/1805.10180, 2018
  40. “QATM: Quality-Aware Template Matching for Deep Learning” In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 11545–11554
  41. Diogo C Luvizon, Hedi Tabia and David Picard “Human pose regression by combining indirect part detection and contextual information” In Computers & Graphics 85 Elsevier, 2019, pp. 15–22
  42. “Focal loss for dense object detection” In Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2980–2988
  43. Leslie N Smith and Nicholay Topin “Super-convergence: Very fast training of neural networks using large learning rates” In Artificial Intelligence and Machine Learning for Multi-Domain Operations applications 11006, 2019, pp. 369–386 SPIE
  44. “Decoupled Weight Decay Regularization” In International Conference on Learning Representations, 2017
  45. “Predicting Semantic Map Representations From Images Using Pyramid Occupancy Networks” In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11135–11144
Citations (7)

Summary

We haven't generated a summary for this paper yet.