Embedding Earth: Self-supervised contrastive pre-training for dense land cover classification (2203.06041v1)

Published 11 Mar 2022 in cs.CV and cs.LG

Abstract: In training machine learning models for land cover semantic segmentation there is a stark contrast between the availability of satellite imagery to be used as inputs and ground truth data to enable supervised learning. While thousands of new satellite images become freely available on a daily basis, getting ground truth data is still very challenging, time consuming and costly. In this paper we present Embedding Earth a self-supervised contrastive pre-training method for leveraging the large availability of satellite imagery to improve performance on downstream dense land cover classification tasks. Performing an extensive experimental evaluation spanning four countries and two continents we use models pre-trained with our proposed method as initialization points for supervised land cover semantic segmentation and observe significant improvements up to 25% absolute mIoU. In every case tested we outperform random initialization, especially so when ground truth data are scarse. Through a series of ablation studies we explore the qualities of the proposed approach and find that learnt features can generalize between disparate regions opening up the possibility of using the proposed pre-training scheme as a replacement to random initialization for Earth observation tasks. Code will be uploaded soon at https://github.com/michaeltrs/DeepSatModels.

Citations (8)

View on Semantic Scholar

Summary

The paper presents a self-supervised contrastive pre-training method that reduces reliance on scarce ground truth data for land cover segmentation.
It uses innovative augmentation techniques and dual encoder networks to achieve up to a 25% improvement in mean Intersection over Union across various regions.
The approach effectively generalizes to diverse geographic contexts, enabling scalable, data-efficient remote sensing applications in urban planning and climate monitoring.

Overview of "Embedding Earth: Self-supervised contrastive pre-training for dense land cover classification"

The paper, titled "Embedding Earth: Self-supervised contrastive pre-training for dense land cover classification," addresses a critical imbalance in the field of land cover semantic segmentation using satellite imagery: the abundance of input data versus the scarcity of ground truth annotations necessary for supervised learning. With the vast amounts of satellite data available—especially from Sentinel missions reaching petabyte scales—there arises a pronounced need for approaches that exploit these resources without relying heavily on manually labeled datasets. This paper introduces Embedding Earth, a self-supervised contrastive pre-training method aimed at enhancing the performance of dense land cover classification tasks by leveraging the copious available satellite imagery.

Methodology

This research proposes a novel self-supervised learning framework that constructs dense instance discriminative representations for land cover segmentation. Unlike traditional supervised learning, which relies on extensive ground truth data, this method employs a self-supervised paradigm focusing on spatio-temporal relationships across extensive satellite imagery datasets. The approach involves:

Data Augmentation and View Extraction: It leverages a unique augmentation strategy, splitting it into sample and batch level augmentations to create distinct views of input data, aiding in learning by contrastive approaches.
Contrastive Encoders: Using two encoder models—queries and keys—the pre-training encapsulates dense representations which are subsequently refined by a projective head.
Correspondence Module: The technique involves smart sampling of pixel pairs, maintaining correspondence despite data augmentation variations, and thereby creating positive and negative pairs for training without explicit supervision.
Iterative Contrastive Loss: The training adopts a contrastive loss mechanism to refine the representations such that pixels from the same land cover are grouped closely in the representation space, while dissimilar pixels are pushed apart.

Results

Experimentally, the proposed technique was applied across various countries, including Germany, France, Ghana, and South Sudan, utilizing Sentinel-2 data. The approach demonstrated substantial improvements of up to 25% in mean Intersection over Union (mIoU) compared to traditional random initialization methods. The methodology proved particularly beneficial when ground truth annotations were sparse, indicating efficacy in data-sparse environments. Importantly, through ablation studies, it was shown that features learned can generalize effectively across geographical regions, thus broadening the applicability of this pre-training scheme beyond peculiar dataset confines.

Implications and Future Directions

This research posits significant practical and theoretical implications for Earth observation tasks. The methodology could lead to efficient resource usage by minimizing dependency on exhaustive ground truth data, which is often costly and time-consuming to obtain. The demonstrated efficacy across diverse geographic and climatic conditions underscores its potential utility as a standard pre-training protocol for remote sensing applications.

Future studies can expand on this foundation by exploring further applications to other Earth observation tasks, such as object detection and change detection, or integrating multi-modal data sources to enrich the learned representations. Moreover, fine-tuning this methodology with domain-specific augmentations or integrating finer temporal dynamics could provide additional performance gains. The implementation of such techniques in large-scale operational frameworks can facilitate automated and semi-automated systems critical in climate monitoring, urban planning, and agricultural analysis.

PDF Markdown

Related Papers

GitHub

GitHub - michaeltrs/DeepSatModels: Deep learning models for remote sensing applications (160 stars)