Prompt-Driven Building Footprint Extraction in Aerial Images with Offset-Building Model (2310.16717v3)
Abstract: More accurate extraction of invisible building footprints from very-high-resolution (VHR) aerial images relies on roof segmentation and roof-to-footprint offset extraction. Existing state-of-the-art methods based on instance segmentation suffer from poor generalization when extended to large-scale data production and fail to achieve low-cost human interactive annotation. The latest prompt paradigms inspire us to design a promptable framework for roof and offset extraction, which transforms end-to-end algorithms into promptable methods. Within this framework, we propose a novel Offset-Building Model (OBM). To rigorously evaluate the algorithm's capabilities, we introduce a prompt-based evaluation method, where our model reduces offset errors by 16.6% and improves roof Intersection over Union (IoU) by 10.8% compared to other models. Leveraging the common patterns in predicting offsets, we propose Distance-NMS (DNMS) algorithms, enabling the model to further reduce offset vector loss by 6.5%. To further validate the generalization of models, we tested them using a new dataset with over 7,000 manually annotated instance samples. Our algorithms and dataset are available at https://anonymous.4open.science/r/OBM-B3EC.
- F. Zhang, N. Nauata, and Y. Furukawa, “Conv-MPN: Convolutional Message Passing Neural Network for Structured Outdoor Architecture Reconstruction,” in CVPR, 2020, pp. 2795–2804.
- Y. T. Debao Huang and R. Qin, “An evaluation of planetscope images for 3d reconstruction and change detection - experimental validations with case studies,” GIScience and Remote Sensing, vol. 59, pp. 744–761, 2022.
- J. Mahmud, TRUE. Price, A. Bapat, and J.-M. Frahm, “Boundary-Aware 3D Building Reconstruction From a Single Overhead Image,” in CVPR, 2020, pp. 438–448.
- J. Yuan, “Learning Building Extraction in Aerial Scenes with Convolutional Networks,” IEEE TPAMI, vol. 40, no. 11, pp. 2793–2798, 2018.
- S. Chen, Y. Shi, Z. Xiong, and X. X. Zhu, “Htc-dc net: Monocular height estimation from single remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–18, 2023.
- J. Inglada, “Automatic recognition of man-made objects in high resolution optical remote sensing images by SVM classification of geometric image features,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 62, no. 3, pp. 236–248, 2007.
- M. Ortner, X. Descombes, and J. Zerubia, “A Marked Point Process of Rectangles and Segments for Automatic Analysis of Digital Elevation Models,” IEEE TPAMI, vol. 30, no. 1, pp. 105–119, 2008.
- F. Lafarge, X. Descombes, J. Zerubia, and M. Pierrot-Deseilligny, “Structural approach for building reconstruction from a single DSM,” IEEE TPAMI, vol. 32, no. 1, pp. 135–147, 2010.
- D. Cheng, R. Liao, S. Fidler, and R. Urtasun, “DARNet: Deep Active Ray Network for Building Segmentation,” in CVPR, 2019, pp. 7423–7431.
- Q. Zhu, C. Liao, H. Hu, X. Mei, and H. Li, “MAP-Net: Multiple Attending Path Neural Network for Building Footprint Extraction From Remote Sensed Imagery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 7, pp. 6169–6181, 2021.
- M. Li, F. Lafarge, and R. Marlet, “Approximating shapes in images with low-complexity polygons,” in CVPR, 2020, pp. 8630–8638.
- J. Wang, L. Meng, W. Li, W. Yang, L. Yu, and G.-S. Xia, “Learning to Extract Building Footprints From Off-Nadir Aerial Images,” IEEE TPAMI, vol. 45, no. 1, pp. 1294–1301, 2023.
- K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in ICCV, 2017, pp. 2980–2988.
- N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, “Soft-NMS – Improving object detection with one line of code,” in ICCV, Oct 2017.
- Y. He, C. Zhu, J. Wang, M. Savvides, and X. Zhang, “Bounding box regression with uncertainty for accurate object detection,” in CVPR, 2019.
- A. Kirillov et al., “Segment anything,” arXiv preprint arXiv:2304.02643, 2023.
- L. Ke, M. Ye, M. Danelljan, Y. Liu, Y.-W. Tai, C.-K. Tang, and F. Yu, “Segment anything in high quality,” arXiv preprint arXiv:2306.01567, 2023.
- K. Chen, C. Liu, H. Chen, H. Zhang, W. Li, Z. Zou, and Z. Shi, “Rsprompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model,” arXiv preprint arXiv:2306.16269, 2023.
- G. Priestnall, J. Jaafar, and A. Duncan, “Extracting urban features from lidar digital surface models,” Computers, Environment and Urban Systems, vol. 24, no. 2, pp. 65–78, 2000.
- S. R. Khattak, D. S. Buckstein, and A. Hogue, “Reconstructing 3d buildings from lidar using level set methods,” in 2013 International Conference on Computer and Robot Vision, 2013, pp. 151–158.
- Z. Chen, H. Ledoux, S. Khademi, and L. Nan, “Reconstructing compact building models from point clouds using deep implicit fields,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 194, pp. 58–73, 2022.
- J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
- Z. Jiang, Y. Li, C. Yang, P. Gao, Y. Wang, Y. Tai, and C. Wang, “Prototypical contrast adaptation for domain adaptive semantic segmentation,” in ECCV, 2022, pp. 36–54.
- H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in CVPR, July 2017.
- X. Zhao, W. Ding, Y. An, Y. Du, T. Yu, M. Li, M. Tang, and J. Wang, “Fast segment anything,” arXiv preprint arXiv:2306.12156, 2023.
- Z. Jiang, Z. Gu, J. Peng, H. Zhou, L. Liu, Y. Wang, Y. Tai, C. Wang, and L. Zhang, “Stc: spatio-temporal contrastive learning for video instance segmentation,” in ECCV. Springer, 2022, pp. 539–556.
- X. Wang, X. Zhang, Y. Cao, W. Wang, C. Shen, and T. Huang, “Seggpt: Segmenting everything in context,” arXiv preprint arXiv:2304.03284, 2023.
- X. Zou, J. Yang, H. Zhang, F. Li, L. Li, J. Wang, L. Wang, J. Gao, and Y. J. Lee, “Segment everything everywhere all at once,” Advances in Neural Information Processing Systems, vol. 36, 2024.
- R. Zhang, Z. Jiang, Z. Guo, S. Yan, J. Pan, X. Ma, H. Dong, P. Gao, and H. Li, “Personalize Segment Anything Model with One Shot,” Oct. 2023.
- N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman, “DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation,” Mar. 2023.
- F. Rajič, L. Ke, Y.-W. Tai, C.-K. Tang, M. Danelljan, and F. Yu, “Segment anything meets point tracking,” arXiv preprint arXiv:2307.01197, 2023.
- N. Karaev, I. Rocco, B. Graham, N. Neverova, A. Vedaldi, and C. Rupprecht, “Cotracker: It is better to track together,” arXiv preprint arXiv:2307.07635, 2023.
- T. Chen, L. Zhu, C. Deng, R. Cao, Y. Wang, S. Zhang, Z. Li, L. Sun, Y. Zang, and P. Mao, “Sam-adapter: Adapting segment anything in underperformed scenes,” in ICCV, October 2023, pp. 3367–3375.
- L. P. Osco, Q. Wu, E. L. de Lemos, W. N. Gonçalves, A. P. M. Ramos, J. Li, and J. Marcato, “The Segment Anything Model (SAM) for remote sensing applications: From zero to one shot,” International Journal of Applied Earth Observation and Geoinformation, vol. 124, p. 103540, 2023.
- P. Liu, X. Liu, M. Liu, Q. Shi, J. Yang, X. Xu, and Y. Zhang, “Building footprint extraction from high-resolution images via spatial residual inception convolutional neural network,” Remote Sensing, vol. 11, no. 7, p. 830, 2019.
- J. Kang, R. Fernandez-Beltran, X. Sun, J. Ni, and A. Plaza, “Deep learning-based building footprint extraction with missing annotations,” IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2021.
- W. Li, C. He, J. Fang, J. Zheng, H. Fu, and L. Yu, “Semantic segmentation-based building footprint extraction using very high-resolution satellite images and multi-source gis data,” Remote Sensing, vol. 11, no. 4, p. 403, 2019.
- T. Yu, P. Tang, B. Zhao, S. Bai, P. Gou, J. Liao, and C. Jin, “Convbnet: A convolutional network for building footprint extraction,” IEEE Geoscience and Remote Sensing Letters, vol. 20, pp. 1–5, 2023.
- J. Cai and Y. Chen, “Mha-net: Multipath hybrid attention network for building footprint extraction from high-resolution remote sensing imagery,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 5807–5817, 2021.
- D. Cheng, R. Liao, S. Fidler, and R. Urtasun, “Darnet: Deep active ray network for building segmentation,” in CVPR, June 2019.
- Z. Ma, M. Xia, L. Weng, and H. Lin, “Local feature search network for building and water segmentation of remote sensing image,” Sustainability, vol. 15, no. 4, p. 3034, 2023.
- Y. Shi, Q. Li, and X. X. Zhu, “Building segmentation through a gated graph convolutional neural network with deep structured feature embedding,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 159, pp. 184–197, 2020.
- B. Sariturk and D. Z. Seker, “A residual-inception u-net (riu-net) approach and comparisons with u-shaped cnn and transformer models for building segmentation from high-resolution satellite images,” Sensors, vol. 22, no. 19, p. 7624, 2022.
- S. Liu, F. Li, H. Zhang, X. Yang, X. Qi, H. Su, J. Zhu, and L. Zhang, “Dab-detr: Dynamic anchor boxes are better queries for detr,” in ICLR, 2022.
- N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in ECCV, 2020, pp. 213–229.
- B. Cheng, A. Schwing, and A. Kirillov, “Per-pixel classification is not all you need for semantic segmentation,” vol. 34, pp. 17 864–17 875, 2021.
- B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, “Masked-attention mask transformer for universal image segmentation,” in CVPR, 2022.
- F. Li, H. Zhang, S. Liu, J. Guo, L. M. Ni, and L. Zhang, “Dn-detr: Accelerate detr training by introducing query denoising,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13 619–13 627.
- H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y. Shum, “Dino: Detr with improved denoising anchor boxes for end-to-end object detection,” 2022.
- F. Li, H. Zhang, H. xu, S. Liu, L. Zhang, L. M. Ni, and H.-Y. Shum, “Mask dino: Towards a unified transformer-based framework for object detection and segmentation,” 2022.
- Y. Zhang and Q. Yang, “A survey on multi-task learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 12, pp. 5586–5609, 2021.
- K. Chen et al., “Hybrid task cascade for instance segmentation,” in CVPR, 2019.
- P. Li, Y. Li, J. Feng, Z. Ma, and X. Li, “Automatic detection and recognition of road intersections for road extraction from imagery,” The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 43, pp. 113–117, 2020.
- Y. Liu, Z. Han, C. Chen, L. Ding, and Y. Liu, “Eagle-eyed multitask cnns for aerial image retrieval and scene classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 9, pp. 6699–6721, 2020.
- W. Jun, M. Son, J. Yoo, and S. Lee, “Optimal configuration of multi-task learning for autonomous driving,” Sensors, vol. 23, no. 24, 2023. [Online]. Available: https://www.mdpi.com/1424-8220/23/24/9729
- D. Vu, B. Ngo, and H. Phan, “Hybridnets: End-to-end perception network,” arXiv preprint arXiv:2203.09035, 2022.
- D. Wu, M.-W. Liao, W.-T. Zhang, X.-G. Wang, X. Bai, W.-Q. Cheng, and W.-Y. Liu, “Yolop: You only look once for panoptic driving perception,” Machine Intelligence Research, vol. 19, no. 6, pp. 550–562, 2022.
- C. Han, Q. Zhao, S. Zhang, Y. Chen, Z. Zhang, and J. Yuan, “Yolopv2: Better, faster, stronger for panoptic driving perception,” arXiv preprint arXiv:2208.11434, 2022.
- K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in CVPR, 2022, pp. 16 000–16 009.
- G. Cheng, X. Yuan, X. Yao, K. Yan, Q. Zeng, X. Xie, and J. Han, “Towards large-scale small object detection: Survey and benchmarks,” IEEE TPAMI, 2023.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in ECCV, 2014, pp. 740–755.
- B. Cheng, R. Girshick, P. Dollár, A. C. Berg, and A. Kirillov, “Boundary iou: Improving object-centric image segmentation evaluation,” in CVPR, 2021, pp. 15 334–15 342.
- K. Chen et al., “MMDetection: Open mmlab detection toolbox and benchmark,” arXiv preprint arXiv:1906.07155, 2019.