Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation (2404.02638v1)

Published 3 Apr 2024 in cs.CV

Abstract: This paper aims at achieving fine-grained building attribute segmentation in a cross-view scenario, i.e., using satellite and street-view image pairs. The main challenge lies in overcoming the significant perspective differences between street views and satellite views. In this work, we introduce SG-BEV, a novel approach for satellite-guided BEV fusion for cross-view semantic segmentation. To overcome the limitations of existing cross-view projection methods in capturing the complete building facade features, we innovatively incorporate Bird's Eye View (BEV) method to establish a spatially explicit mapping of street-view features. Moreover, we fully leverage the advantages of multiple perspectives by introducing a novel satellite-guided reprojection module, optimizing the uneven feature distribution issues associated with traditional BEV methods. Our method demonstrates significant improvements on four cross-view datasets collected from multiple cities, including New York, San Francisco, and Boston. On average across these datasets, our method achieves an increase in mIOU by 10.13% and 5.21% compared with the state-of-the-art satellite-based and cross-view methods. The code and datasets of this work will be released at https://github.com/yejy53/SG-BEV.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Roadtracer: Automatic extraction of road networks from aerial images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4720–4728, 2018.
  2. Zoedepth: Zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288, 2023.
  3. Semiroadexnet: A semi-supervised network for road extraction from remote sensing imagery via adversarial learning. ISPRS Journal of Photogrammetry and Remote Sensing, 198:169–183, 2023.
  4. Cvcmff net: Complex-valued convolutional and multifeature fusion network for building semantic segmentation of insar images. IEEE Transactions on Geoscience and Remote Sensing, 60:1–14, 2021.
  5. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016.
  6. SuperFusion: Multilevel LiDAR-Camera Fusion for Long-Range HD Map Generation. arXiv preprint arXiv:2211.15656, 2022.
  7. Urban zoning using higher-order markov random fields on multi-view imagery data. In Proceedings of the European Conference on Computer Vision (ECCV), pages 614–630, 2018.
  8. Segnext: Rethinking convolutional attention design for semantic segmentation. Advances in Neural Information Processing Systems, 35:1140–1156, 2022.
  9. Polarformer: Multi-camera 3d object detection with polar transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 1042–1050, 2023.
  10. Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning. ISPRS journal of photogrammetry and remote sensing, 145:60–77, 2018.
  11. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12697–12705, 2019.
  12. Delving into the devils of bird’s-eye-view perception: A review, evaluation and recipe. arXiv preprint arXiv:2209.05324, 2022.
  13. Hdmapnet: An online hd map construction and evaluation framework. In 2022 International Conference on Robotics and Automation (ICRA), pages 4628–4634. IEEE, 2022.
  14. Omnicity: Omnipotent city understanding with multi-level and multi-view images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17397–17407, June 2023.
  15. Joint semantic–geometric learning for polygonal building segmentation from high-resolution remote sensing images. ISPRS Journal of Photogrammetry and Remote Sensing, 201:26–37, 2023.
  16. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In European conference on computer vision, pages 1–18. Springer, 2022.
  17. Cross-view image geolocalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 891–898, 2013.
  18. Seeing beyond the patch: Scale-adaptive semantic segmentation of high-resolution remote sensing imagery based on reinforcement learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16868–16878, 2023.
  19. Geometry-aware satellite-to-ground image synthesis for urban areas. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 859–867, 2020.
  20. Cross-view semantic segmentation for sensing surroundings. IEEE Robotics and Automation Letters, 5(3):4867–4873, 2020.
  21. Bevsegformer: Bird’s eye view semantic segmentation from arbitrary camera rigs. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5935–5943, 2023.
  22. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, pages 194–210. Springer, 2020.
  23. A sim2real deep learning approach for the transformation of images from multiple vehicle-mounted cameras to a semantically segmented image in bird’s eye view. In 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), pages 1–7. IEEE, 2020.
  24. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  25. The isprs benchmark on urban object classification and 3d building reconstruction. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences; I-3, 1(1):293–298, 2012.
  26. Beyond cross-view image retrieval: Highly accurate vehicle localization using satellite image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17010–17020, 2022.
  27. Spatial-aware feature aggregation for image based cross-view geo-localization. Advances in Neural Information Processing Systems, 32, 2019.
  28. Boosting 3-dof ground-to-satellite camera localization accuracy via geometry-guided cross-view transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 21516–21526, 2023.
  29. Where am i looking at? joint location and orientation estimation by cross-view matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4064–4072, 2020.
  30. Deepmao: Deep multi-scale aware overcomplete network for building segmentation in satellite imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 487–496, 2023.
  31. Understanding urban landuse from the above and ground perspectives: A deep learning, multimodal solution. Remote sensing of environment, 228:129–143, 2019.
  32. 360bev: Panoramic semantic mapping for indoor bird’s-eye view. In 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024.
  33. Coming down to earth: Satellite-to-street view synthesis for geo-localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6488–6497, 2021.
  34. Fine-grained cross-view geo-localization using a correlation-aware homography estimator. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  35. Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In Conference on Robot Learning, pages 180–191. PMLR, 2022.
  36. Holistic multi-view building analysis in the wild with projection pooling. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 2870–2878, 2021.
  37. Revisiting near/remote sensing with geospatial attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1778–1787, 2022.
  38. A unified model for near and remote sensing. In Proceedings of the IEEE International Conference on Computer Vision, pages 2688–2697, 2017.
  39. Side adapter network for open-vocabulary semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2945–2954, 2023.
  40. Rssformer: Foreground saliency enhancement for remote sensing land-cover segmentation. IEEE Transactions on Image Processing, 32:1052–1064, 2023.
  41. Parametric depth based feature representation learning for object detection and segmentation in bird’s-eye view. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8483–8492, 2023.
  42. Foreground-aware relation network for geospatial object segmentation in high spatial resolution remote sensing imagery. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4096–4105, 2020.
  43. Transgeo: Transformer is all you need for cross-view image geo-localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1162–1171, 2022.
  44. Vigor: Cross-view image geo-localization beyond one-to-one retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3640–3649, 2021.
Citations (7)

Summary

We haven't generated a summary for this paper yet.