Continental-Scale Building Detection from High Resolution Satellite Imagery (2107.12283v2)

Published 26 Jul 2021 in cs.CV

Abstract: Identifying the locations and footprints of buildings is vital for many practical and scientific purposes. Such information can be particularly useful in developing regions where alternative data sources may be scarce. In this work, we describe a model training pipeline for detecting buildings across the entire continent of Africa, using 50 cm satellite imagery. Starting with the U-Net model, widely used in satellite image analysis, we study variations in architecture, loss functions, regularization, pre-training, self-training and post-processing that increase instance segmentation performance. Experiments were carried out using a dataset of 100k satellite images across Africa containing 1.75M manually labelled building instances, and further datasets for pre-training and self-training. We report novel methods for improving performance of building detection with this type of model, including the use of mixup (mAP +0.12) and self-training with soft KL loss (mAP +0.06). The resulting pipeline obtains good results even on a wide variety of challenging rural and urban contexts, and was used to create the Open Buildings dataset of 516M Africa-wide detected footprints.

Citations (145)

View on Semantic Scholar

Summary

The paper’s main contribution is a U-Net based pipeline that detects 516M building footprints across diverse African terrains.
It introduces novel loss functions and regularization methods, including mixup and soft KL loss, improving mAP by up to +0.12.
The study leverages a vast labeled dataset from high-resolution imagery to support urban planning and crisis response in data-scarce regions.

Continental-Scale Building Detection from High-Resolution Satellite Imagery

The paper presents an intricate model training pipeline to detect building footprints across Africa, leveraging high-resolution 50 cm satellite imagery. Employing U-Net, a robust model for semantic segmentation tasks, the authors advance methodologies in architecture alteration, loss functions, and regularization to optimize instance segmentation performance. This research addresses significant geographic challenges, considering Africa's diverse terrains and building types, thereby creating the Open Buildings dataset featuring 516 million detected footprints across the continent.

The paper elucidates the importance of such datasets and detection models, particularly for applications ranging from urban planning to environmental science, especially in regions with scarce alternative data sources. Building on the popular U-Net architecture, the authors explore the efficiency of several configurations. Their domains include novel loss functions, regularization methods such as mixup (with an improvement of mAP by +0.12), and self-training with soft KL loss (improving mAP by +0.06). These refined strategies present a significant enhancement over previous methodologies which often relied on limited geographic scope or lower resolution images.

To provide a robust training and evaluation foundation, the paper employs substantially labeled datasets, including 99,902 training images and 1.67 million building instances, complemented by 1,920 test images. Acknowledging the heterogeneity of African settlements, the researchers ensure comprehensive representation from rural to urban areas, thus mitigating common challenges like occluded or closely-packed building layouts.

The authors emphasize a novel Gaussian convolution approach for distance weighting during segmentation training, which improves processing efficiency and performance by focusing on well-segmented instance boundaries. Notably, they introduce self-training as a key component with an impressive enhancement in results, evidenced by improved precision at higher recall levels across varied contexts.

Importantly, pre-training strategies utilizing domain-specific datasets were analyzed, with a focus on whether such models could outperform those initiated from ImageNet weights. While attempts with tasks such as nighttime luminance yielded marginal gains, starting with ImageNet weights remained most effective.

The paper's implications are profound as it unlocks actionable urban data insights, contributing to applications in infrastructure management and crisis response. Furthermore, the effort underscores an archetype of how extensive coverage can be addressed using sophisticated segmentation methods within deep learning frameworks.

The quest for improved mapping might next delve into multimodal dataset integrations, embracing additional satellite data to enhance detail precision. Additionally, exploring architectures beyond semantic segmentation models, such as those that directly predict instances, could yield further advancements.

Through careful methodologies and robust data handling, this research reflects substantial strides in leveraging AI for practical societal applications, exemplifying a model of large-scale geospatial analysis pertinent to addressing global developmental challenges.

PDF Markdown

Related Papers

YouTube

Show All Videos