High-Resolution Building and Road Detection from Sentinel-2 (2310.11622v3)

Published 17 Oct 2023 in cs.CV

Abstract: Mapping buildings and roads automatically with remote sensing typically requires high-resolution imagery, which is expensive to obtain and often sparsely available. In this work we demonstrate how multiple 10 m resolution Sentinel-2 images can be used to generate 50 cm resolution building and road segmentation masks. This is done by training a student' model with access to Sentinel-2 images to reproduce the predictions of ateacher' model which has access to corresponding high-resolution imagery. While the predictions do not have all the fine detail of the teacher model, we find that we are able to retain much of the performance: for building segmentation we achieve 79.0\% mIoU, compared to the high-resolution teacher model accuracy of 85.5\% mIoU. We also describe two related methods that work on Sentinel-2 imagery: one for counting individual buildings which achieves $R² = 0.91$ against true counts and one for predicting building height with 1.5 meter mean absolute error. This work opens up new possibilities for using freely available Sentinel-2 imagery for a range of tasks that previously could only be done with high-resolution satellite imagery.

Citations (3)

View on Semantic Scholar

Summary

The paper introduces a teacher-student framework that trains a student model using high-resolution pseudo-labels, achieving a building detection mIoU of 78.3%.
The paper leverages temporal stacking and precise registration to overcome alignment issues between low-resolution inputs and high-resolution labels.
The paper demonstrates that using freely available Sentinel-2 data significantly reduces costs while broadening geographic and temporal mapping coverage.

High-Resolution Building and Road Detection from Sentinel-2 Imagery

The paper "High-resolution building and road detection from Sentinel-2" presents a method for generating high-resolution maps of buildings and roads using the relatively low-resolution (10-meter) imagery provided by the Sentinel-2 satellite. The authors employ a teacher-student learning approach, wherein a student model is trained to replicate the predictions of a high-resolution (50 cm) teacher model using only the Sentinel-2 imagery input. This methodology facilitates significant cost reductions and broader geographic and temporal coverage compared to using high-resolution imagery exclusively.

Methodology

The core innovation lies in the teacher-student framework. A pretrained teacher model generates building and road segmentation masks using high-resolution imagery and these masks serve as pseudo-labels for training the student model. The student model, analyzing multiple Sentinel-2 images, then learns to predict these segmentation masks. Notably, their end-to-end network does not embody a separate super-resolution step but rather directly outputs high-resolution predictions from low-resolution inputs using convolutional neural network (CNN) architectures and specific encoder-decoder structures.

Significant attention is given to registration and alignment issues between the inputs and high-resolution labels. This is vital to ensure accurate representation, given inherent positional variances across low-resolution inputs. The network architecture further enhances learning by utilizing temporal stacks of images, thereby exploiting minor Earth and atmospheric shifts captured across the Sentinel-2 time-series to accumulate finer detail.

Numerical Results

Empirical evaluations reflect robust performance with a mean Intersection over Union (mIoU) of 78.3% for building detection when compared against the teacher model’s mIoU of 85.3%. The authors also describe a method for counting individual buildings within a Sentinel-2 patch using centroid prediction, achieving a coefficient of determination $R^2 = 0.91$ . This suggests a high degree of fidelity in representing spatial phenomena previously thought challenging at this resolution scale.

Practical and Theoretical Implications

Practically, this approach democratizes access to high-resolution mapping datasets by leveraging freely accessible Sentinel-2 imagery. The implications for urban planning, disaster management, and infrastructure development are substantial as these resources can now be tapped into with reduced costs and larger spatial-temporal scope.

Theoretically, this work vectors into the broader discourse on resolution enhancement through AI-powered models in the field of remote sensing. It showcases the potential for CNN-based models to utilize temporal intelligence from sequential images, echoing the efficacy of temporal stacking strategies in capturing static features like buildings and roads from dynamic sensor data.

Future Directions

The paper posits several areas for future work, including addressing urban and tall building misalignments due to imagery orthorectification issues and improving cloud-robustness of input stacks. Additionally, it opens avenues for employing generative AI models such as GANs or diffusion models for even more pronounced super-resolution tasks in remote sensing, given their preliminary success in single-image contexts.

Conclusion

The work presents a compelling case for synthesizing high-quality spatial data from affordable imagery sources, extending the toolkit available for both researchers and practitioners in the field. By bridging the gap in resolution through intelligent machine learning frameworks, the methodology holds promise for enhancing the granularity and timeliness of geographic data analyses without the dependency on costly high-resolution imagery.

PDF Markdown

Related Papers

Tweets

https://twitter.com/A__Diack/status/1836777812964827174

https://twitter.com/SekouLRemy/status/1837005955134795799

https://twitter.com/A__Diack/status/1836908933849420212

https://twitter.com/A__Diack/status/1836863970424000661

YouTube

Show All Videos