Context-self contrastive pretraining for crop type semantic segmentation (2104.04310v3)

Published 9 Apr 2021 in cs.CV and cs.LG

Abstract: In this paper, we propose a fully supervised pre-training scheme based on contrastive learning particularly tailored to dense classification tasks. The proposed Context-Self Contrastive Loss (CSCL) learns an embedding space that makes semantic boundaries pop-up by use of a similarity metric between every location in a training sample and its local context. For crop type semantic segmentation from Satellite Image Time Series (SITS) we find performance at parcel boundaries to be a critical bottleneck and explain how CSCL tackles the underlying cause of that problem, improving the state-of-the-art performance in this task. Additionally, using images from the Sentinel-2 (S2) satellite missions we compile the largest, to our knowledge, SITS dataset densely annotated by crop type and parcel identities, which we make publicly available together with the data generation pipeline. Using that data we find CSCL, even with minimal pre-training, to improve all respective baselines and present a process for semantic segmentation at super-resolution for obtaining crop classes at a more granular level. The code and instructions to download the data can be found in https://github.com/michaeltrs/DeepSatModels.

Citations (15)

View on Semantic Scholar

Summary

The paper introduces the Context-Self Contrastive Loss (CSCL) framework to optimize class boundaries in crop segmentation tasks.
The authors compile the largest annotated Satellite Image Time Series (SITS) dataset to advance remote sensing research.
Experiments demonstrate significant improvements in mIoU and F1 scores, enabling enhanced high-resolution segmentation for precise agricultural mapping.

Overview of Context-Self Contrastive Pre-Training for Crop Type Semantic Segmentation

The paper presents a novel approach to addressing the challenges in crop type semantic segmentation using satellite imagery, specifically tackling issues related to boundary performance in densely annotated datasets. The authors introduce a fully supervised pre-training scheme based on contrastive learning, termed the Context-Self Contrastive Loss (CSCL), designed to enhance the performance of convolutional neural networks (CNNs) in dense classification tasks.

Main Contributions

Contrastive Learning Framework: The CSCL framework aims to optimize class boundaries by leveraging local neighborhood embeddings in the feature space. This is accomplished through a contrastive loss that differentiates between embeddings of boundary and interior pixels, thus improving model performance at semantic boundaries.
Dataset Compilation: The authors provide a significant contribution by compiling the largest known Satellite Image Time Series (SITS) dataset annotated for crop types and parcel identities. This dataset, based on Sentinel-2 imagery, is made publicly available along with a data generation pipeline, which is crucial for fostering further research in the domain.
Enhanced Resolution Segmentation: Utilizing the CSCL framework, the paper demonstrates improved semantic segmentation performance at a resolution exceeding that of the input images, thus achieving more granular crop class discrimination.

Methodology

The CSCL approach involves an encoder function to map input images to a feature space, and a similarity function to derive local similarities between pixels. Ground truth labels are restructured to compare class agreement between each pixel and its local neighborhood. The pre-training loss function leverages these similarities, focusing on optimizing network weights to better distinguish between different crop types, particularly at parcel boundaries where traditional models often underperform.

Through a series of ablation studies, the authors demonstrate the efficacy of key components of the CSCL method, such as local affinity matrices and relative positional encodings, in enhancing segmentation accuracy. The robustness of their approach is validated against a variety of parameter settings, further cementing the method's applicability in real-world scenarios.

Results and Implications

The experimental results indicate substantial improvements over baseline models, with notable increases in mean Intersection over Union (mIoU) scores and F1 metrics across various datasets, including a significant advancement in the segmentation accuracy of boundary pixels. This improvement translates to more accurate and reliable crop maps, which are indispensable tools for agricultural monitoring and policy-making.

The implications of this work are multifaceted. Practically, this methodology facilitates more precise crop type classification, which is critical for applications like monitoring agricultural subsidies and implementing agricultural policies. Theoretically, the paper advances the understanding of contrastive pre-training in dense classification tasks, offering insights into methods that could be adapted to other domains requiring fine boundary delineation.

Future Directions

Looking ahead, the paper opens several avenues for future exploration. The integration of high-resolution data from heterogeneous sources, combined with the proposed method, could further enhance segmentation quality. Additionally, extending the CSCL framework to other types of earth observation data or to tasks involving more diverse scene complexities could broaden the scope and impact of the methodology.

In summary, this paper presents a comprehensive approach to improving the semantic segmentation of crop types using contrastive learning, with the potential to significantly benefit remote sensing and agricultural sectors. The release of the dataset and code further amplifies the opportunity for continued innovation in this field.

PDF Markdown

Related Papers

GitHub

GitHub - michaeltrs/DeepSatModels: Deep learning models for remote sensing applications (127 stars)