Geography-Aware Self-Supervised Learning (2011.09980v7)

Published 19 Nov 2020 in cs.CV

Abstract: Contrastive learning methods have significantly narrowed the gap between supervised and unsupervised learning on computer vision tasks. In this paper, we explore their application to geo-located datasets, e.g. remote sensing, where unlabeled data is often abundant but labeled data is scarce. We first show that due to their different characteristics, a non-trivial gap persists between contrastive and supervised learning on standard benchmarks. To close the gap, we propose novel training methods that exploit the spatio-temporal structure of remote sensing data. We leverage spatially aligned images over time to construct temporal positive pairs in contrastive learning and geo-location to design pre-text tasks. Our experiments show that our proposed method closes the gap between contrastive and supervised learning on image classification, object detection and semantic segmentation for remote sensing. Moreover, we demonstrate that the proposed method can also be applied to geo-tagged ImageNet images, improving downstream performance on various tasks. Project Webpage can be found at this link geography-aware-ssl.github.io.

Authors (7)

Kumar Ayush (18 papers)
Burak Uzkent (18 papers)
Chenlin Meng (39 papers)
Kumar Tanmay (10 papers)
Marshall Burke (26 papers)
David Lobell (25 papers)
Stefano Ermon (279 papers)

Citations (191)

View on Semantic Scholar

Summary

Geography-Aware Self-Supervised Learning: Overview and Insights

The paper "Geography-Aware Self-Supervised Learning" presents a novel methodology that harnesses the potential of self-supervised learning tailored specifically for geo-located datasets, such as those encountered in remote sensing. The essence of this paper resides in leveraging the innate spatial and temporal characteristics of such datasets to bridge the performance gap that typically exists between supervised and contrastive learning approaches in remote sensing applications.

Key Contributions and Methodology

The methodology introduced by the authors innovatively identifies and utilizes the inherent characteristics of remote sensing data, which are often spatially aligned and temporally ordered due to the nature of satellite imaging:

Temporal Positive Pairs: The approach exploits temporal alignments by forming positive pairs from spatially aligned images over time. This method stands in contrast to the conventional use of image augmentations for forming positive pairs. By emphasizing temporal relation, this adaptation renders the learned representations more robust to temporal variability inherent in satellite imagery, such as changes due to seasonality.
Geo-Location as a Pretext Task: The paper also introduces a geo-location prediction task as an unsupervised training component. By predicting the geographical origin of an image, the method incorporates geographical semantics into the learned features. This task enriches the contrastive framework by grounding the visual representations in geographical context, which is often critically relevant for remote sensing tasks.
Unified Framework: The authors propose a comprehensive geography-aware contrastive learning framework that integrates both temporal and geo-location strategies. By formulating a hybrid loss function, the framework is able to enhance the discriminative power of the learned features across a variety of downstream tasks.

Experiments and Results

The experimental validation spans a variety of tasks and datasets, demonstrating substantial improvements:

Functional Map of the World Dataset: The framework significantly improved image classification accuracy (by approximately 8% on average) over baseline contrastive methods, effectively closing the gap with supervised techniques. Furthermore, the proposed method surpassed supervised learning in temporal data classification by about 2%.
Generalization to Geo-Tagged Image Datasets: The application to geo-tagged subsets of ImageNet highlighted the versatility of the method beyond strictly remote sensing data, yielding a performance boost of around 2% in classification tasks.
Object Detection and Semantic Segmentation: The proposed strategy showed notable improvements in object detection (7% AP increase) and semantic segmentation (3.58% mIoU increase) when tested on large-scale datasets like xView and SpaceNet.

Implications and Future Directions

The paper provides a substantive advancement in self-supervised learning, particularly for applications on geo-located and temporally rich datasets. The integration of geographical priors into representation learning frameworks effectively enhances the utility and performance of learned models in practical, real-world tasks like object detection and segmentation in satellite imagery.

Looking forward, this research paves the way for extending self-supervised methodologies to multi-modal datasets where geographic, temporal, and even other sensor data (e.g., hyperspectral) could provide complementary information. Additionally, future work could explore more intricate pretext tasks or multi-task learning frameworks that seamlessly integrate diverse sources of auxiliary information to further enhance the adaptability and robustness of the learning models across broader applications.

The insights garnered from this paper contribute meaningfully to the ongoing advancements in AI methodologies applicable to geospatial intelligence, having enduring implications for both academia and industry applications that rely heavily on remote sensing technologies.

PDF Markdown