Terrain-Informed Self-Supervised Learning: Enhancing Building Footprint Extraction from LiDAR Data with Limited Annotations (2311.01188v2)
Abstract: Estimating building footprint maps from geospatial data is of paramount importance in urban planning, development, disaster management, and various other applications. Deep learning methodologies have gained prominence in building segmentation maps, offering the promise of precise footprint extraction without extensive post-processing. However, these methods face challenges in generalization and label efficiency, particularly in remote sensing, where obtaining accurate labels can be both expensive and time-consuming. To address these challenges, we propose terrain-aware self-supervised learning, tailored to remote sensing, using digital elevation models from LiDAR data. We propose to learn a model to differentiate between bare Earth and superimposed structures enabling the network to implicitly learn domain-relevant features without the need for extensive pixel-level annotations. We test the effectiveness of our approach by evaluating building segmentation performance on test datasets with varying label fractions. Remarkably, with only 1% of the labels (equivalent to 25 labeled examples), our method improves over ImageNet pre-training, showing the advantage of leveraging unlabeled data for feature extraction in the domain of remote sensing. The performance improvement is more pronounced in few-shot scenarios and gradually closes the gap with ImageNet pre-training as the label fraction increases. We test on a dataset characterized by substantial distribution shifts and labeling errors to demonstrate the generalizability of our approach. When compared to other baselines, including ImageNet pretraining and more complex architectures, our approach consistently performs better, demonstrating the efficiency and effectiveness of self-supervised terrain-aware feature learning.
- X. X. Zhu, D. Tuia, L. Mou, G.-S. Xia, L. Zhang, F. Xu, and F. Fraundorfer, “Deep learning in remote sensing: A comprehensive review and list of resources,” IEEE Geoscience and Remote Sensing Magazine, vol. 5, no. 4, pp. 8–36, 2017.
- E. Maggiori, Y. Tarabalka, G. Charpiat, and P. Alliez, “Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark,” in 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). IEEE, 2017, pp. 3226–3229.
- A. Janik and K. Sankaran, “Sampling strategy for fine-tuning segmentation models to crisis area under scarcity of data,” arXiv preprint arXiv:2202.04766, 2022.
- B. Uzkent, E. Sheehan, C. Meng, Z. Tang, M. Burke, D. Lobell, and S. Ermon, “Learning to interpret satellite images using wikipedia,” in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019.
- G. Chu, B. Potetz, W. Wang, A. Howard, Y. Song, F. Brucher, T. Leung, and H. Adam, “Geo-aware networks for fine-grained recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019, pp. 0–0.
- K. Ayush, B. Uzkent, C. Meng, K. Tanmay, M. Burke, D. Lobell, and S. Ermon, “Geography-aware self-supervised learning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 181–10 190.
- H. Dong, W. Ma, Y. Wu, J. Zhang, and L. Jiao, “Self-supervised representation learning for remote sensing image change detection based on temporal prediction,” Remote Sensing, vol. 12, no. 11, p. 1868, 2020.
- O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 2015, pp. 234–241.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- A. A. Aleissaee, A. Kumar, R. M. Anwer, S. Khan, H. Cholakkal, G.-S. Xia, and F. S. Khan, “Transformers in remote sensing: A survey,” Remote Sensing, vol. 15, no. 7, p. 1860, 2023.
- C. Hug, P. Krzystek, and W. Fuchs, “Advanced lidar data processing with lastools,” in IAPRS, vol. XXXV. ISPRS, 7 2004. [Online]. Available: https://www.isprs.org/proceedings/XXXV/congress/comm2/papers/240.pdf
- Z. Li, M. E. Hodgson, and W. Li, “A general-purpose framework for parallel processing of large-scale lidar data,” International Journal of Digital Earth, vol. 11, pp. 26–47, 2016. [Online]. Available: https://api.semanticscholar.org/CorpusID:46764144
- M. Vakalopoulou, K. Karantzalos, N. Komodakis, and N. Paragios, “Building detection in very high resolution multispectral data with deep learning features,” in 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). IEEE, 2015, pp. 1873–1876.
- G. Prathap and I. Afanasyev, “Deep learning approach for building detection in satellite multispectral imagery,” in 2018 International Conference on Intelligent Systems (IS), 2018, pp. 461–465.
- L. Ivanovsky, V. Khryashchev, V. Pavlov, and A. Ostrovskaya, “Building detection on aerial images using u-net neural networks,” in 2019 24th Conference of Open Innovations Association (FRUCT), 2019, pp. 116–122.
- H. Miyazaki, K. Kuwata, W. Ohira, Z. Guo, X. Shao, Y. Xu, and R. Shibasaki, “Development of an automated system for building detection from high-resolution satellite images,” in 2016 4th International Workshop on Earth Observation and Remote Sensing Applications (EORSA), 2016, pp. 245–249.
- M. Aamir, Y.-F. Pu, Z. Rahman, M. Tahir, H. Naeem, and Q. Dai, “A framework for automatic building detection from low-contrast satellite images,” Symmetry, vol. 11, no. 1, 2019. [Online]. Available: https://www.mdpi.com/2073-8994/11/1/3
- G. Sohn and I. Dowman, “Data fusion of high-resolution satellite imagery and lidar data for automatic building extraction,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 62, no. 1, pp. 43–63, 2007.
- T. Hermosilla, L. A. Ruiz, J. A. Recio, and J. Estornell, “Evaluation of automatic building detection approaches combining high resolution images and lidar data,” Remote Sensing, vol. 3, no. 6, pp. 1188–1210, 2011.
- S. Ji, S. Wei, and M. Lu, “Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 1, pp. 574–586, 2018.
- D. Yu, S. Ji, J. Liu, and S. Wei, “Automatic 3d building reconstruction from multi-view aerial images with deep learning,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 171, pp. 155–170, 2021.
- M. Alonso and J. Malpica, “Satellite imagery classification with lidar data,” Trees (A), vol. 2106, no. 19, p. 11, 2010.
- P. Goyal, M. Caron, B. Lefaudeux, M. Xu, P. Wang, V. Pai, M. Singh, V. Liptchinsky, I. Misra, A. Joulin et al., “Self-supervised pretraining of visual features in the wild,” arXiv preprint arXiv:2103.01988, 2021.
- K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9729–9738.
- X. Chen, H. Fan, R. Girshick, and K. He, “Improved baselines with momentum contrastive learning,” arXiv preprint arXiv:2003.04297, 2020.
- S. Zhang, Z. Wen, Z. Liu, and Q. Pan, “Rotation awareness based self-supervised learning for sar target recognition,” in IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2019, pp. 1378–1381.
- Z. Zhao, Z. Luo, J. Li, C. Chen, and Y. Piao, “When self-supervised learning meets scene classification: Remote sensing scene classification based on a multitask learning framework,” Remote Sensing, vol. 12, no. 20, p. 3276, 2020.
- C. Tao, J. Qi, W. Lu, H. Wang, and H. Li, “Remote sensing image scene classification with self-supervised paradigm under limited labeled samples,” IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2020.
- S. Vincenzi, A. Porrello, P. Buzzega, M. Cipriano, P. Fronte, R. Cuccu, C. Ippoliti, A. Conte, and S. Calderara, “The color out of space: learning self-supervised representations for earth observation imagery,” in 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2021, pp. 3034–3041.
- J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431–3440.
- E. Maggiori, Y. Tarabalka, G. Charpiat, and P. Alliez, “Convolutional neural networks for large-scale remote-sensing image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 2, pp. 645–657, 2016.
- J. Yuan, “Learning building extraction in aerial scenes with convolutional networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 11, pp. 2793–2798, 2017.
- G. Wu, X. Shao, Z. Guo, Q. Chen, W. Yuan, X. Shi, Y. Xu, and R. Shibasaki, “Automatic building segmentation of aerial imagery using multi-constraint fully convolutional networks,” Remote Sensing, vol. 10, no. 3, p. 407, 2018.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
- R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 586–595.
- R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1440–1448.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 25, 2012.
- Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
- Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2. IEEE, 2003, pp. 1398–1402.
- J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132–7141.
- S. Jyhne, M. Goodwin, P. Andersen, I. Oveland, A. Nossum, K. Ormseth, M. Ørstavik, and A. Flatman, “MapAI: Precision in building segmentation. Nordic Machine Intelligence 2022 sep; 2 (3): 1–3. doi: 10.5617/nmi. 9849.”
- C. H. Sudre, W. Li, T. Vercauteren, S. Ourselin, and M. Jorge Cardoso, “Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, Held in Conjunction with MICCAI 2017, Québec City, QC, Canada, September 14, Proceedings 3. Springer, 2017, pp. 240–248.
- L. Li, T. Zhang, S. Oehmcke, F. Gieseke, and C. Igel, “Buildseg: A general framework for the segmentation of buildings,” arXiv preprint arXiv:2301.06190, 2023.
- K. A. Borgersen and M. Grundetjern, “MapAI competition submission for team Kaborg: Using stable diffusion for ml image augmentation,” Nordic Machine Intelligence, vol. 2, no. 3, 2022.
- E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” Advances in Neural Information Processing Systems, vol. 34, pp. 12 077–12 090, 2021.
- Anuja Vats (8 papers)
- David Völgyes (2 papers)
- Martijn Vermeer (2 papers)
- Marius Pedersen (17 papers)
- Kiran Raja (42 papers)
- Daniele S. M. Fantin (2 papers)
- Jacob Alexander Hay (2 papers)