Self-Supervised Learning of Remote Sensing Scene Representations Using Contrastive Multiview Coding (2104.07070v2)

Published 14 Apr 2021 in cs.CV

Abstract: In recent years self-supervised learning has emerged as a promising candidate for unsupervised representation learning. In the visual domain its applications are mostly studied in the context of images of natural scenes. However, its applicability is especially interesting in specific areas, like remote sensing and medicine, where it is hard to obtain huge amounts of labeled data. In this work, we conduct an extensive analysis of the applicability of self-supervised learning in remote sensing image classification. We analyze the influence of the number and domain of images used for self-supervised pre-training on the performance on downstream tasks. We show that, for the downstream task of remote sensing image classification, using self-supervised pre-training on remote sensing images can give better results than using supervised pre-training on images of natural scenes. Besides, we also show that self-supervised pre-training can be easily extended to multispectral images producing even better results on our downstream tasks.

Citations (95)

View on Semantic Scholar

Summary

The paper demonstrates that self-supervised pre-training with Contrastive Multiview Coding significantly improves remote sensing scene representation over traditional supervised methods.
It reveals that leveraging domain-specific multispectral imaging yields performance gains of 2-4 percentage points even with reduced dataset sizes.
The study highlights that reduced labeling requirements through self-supervised learning offer a practical alternative for high-resolution remote sensing tasks in resource-constrained settings.

Self-Supervised Learning for Remote Sensing Scene Representations

In the field of image classification, particularly within remote sensing applications, the conventionally supervised learning methods present significant limitations due to the necessity for large labeled datasets. This paper by Stojnić and Risojević explores the viability of self-supervised learning techniques to address these challenges and efficiently learn representations from remote sensing images, a domain where data labeling is notably labor-intensive.

The core proposition of the paper is the efficacy of self-supervised pre-training, specifically utilizing Contrastive Multiview Coding (CMC), over traditional supervised pre-training on natural scene images. The authors meticulously compare the performance of self-supervised methods when applied to domains differing in resolution and spectral characteristics, beyond the visible RGB spectrum. Their paper reveals that self-supervised learning, when applied directly to remote sensing images, yields superior representations that enhance classification accuracy on downstream tasks, even with significantly fewer images.

For high-resolution remote sensing tasks, datasets like NWPU-DOTA demonstrate that using domain-specific pre-training surpasses supervised pre-training on ImageNet, with performance improvement manifesting even with reduced dataset sizes. Conversely, low-resolution tasks also benefit from models pre-trained on high-resolution remote sensing data rather than natural scenes, challenging previous assumptions that align pre-training datasets with the resolution of downstream tasks for optimal results.

The paper further explores using multispectral images for self-supervised learning. Results underscore notable performance gains, ranging from two to four percentage points, when leveraging multispectral input as opposed to standard RGB images. The incorporation of Principal Component Analysis (PCA) based views for multispectral data is evaluated, showing potential benefits in representation learning, although the generalization across various tasks requires further investigation.

Finetuning experiments reveal that while self-supervised models perform slightly less optimally than supervised counterparts, the benefits of reduced dataset dependencies and avoidance of extensive labeling exertions are substantial. Notably, self-supervised pre-training demonstrates commendable results without the elaborate requirements of finetuning, making it a practical alternative in resource-constrained environments.

The investigation raises pertinent queries for future exploration, including extending self-supervised learning advantages to other remote sensing tasks such as segmentation and object detection. The research also calls for the development of advanced view creation methods for multispectral images and consideration of scaling self-supervised learning with expansive unlabelled datasets to fully capitalize on the technique's potential.

The implications of this research are wide-reaching, potentially revolutionizing image classification approaches in domains constrained by data labeling limitations, such as remote sensing and medical imaging. By reducing dependency on labeled datasets, self-supervised learning paves the way for leveraging vast amounts of otherwise unutilized data.

PDF Markdown

Related Papers

YouTube

Show All Videos