Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI

Published 26 Jun 2024 in cs.CV and cs.LG | (2406.18295v1)

Abstract: When we are primarily interested in solving several problems jointly with a given prescribed high performance accuracy for each target application, then Foundation Models should for most cases be used rather than problem-specific models. We focus on the specific Computer Vision application of Foundation Models for Earth Observation (EO) and geospatial AI. These models can solve important problems we are tackling, including for example land cover classification, crop type mapping, flood segmentation, building density estimation, and road regression segmentation. In this paper, we show that for a limited number of labelled data, Foundation Models achieve improved performance compared to problem-specific models. In this work, we also present our proposed evaluation benchmark for Foundation Models for EO. Benchmarking the generalization performance of Foundation Models is important as it has become difficult to standardize a fair comparison across the many different models that have been proposed recently. We present the results using our evaluation benchmark for EO Foundation Models and show that Foundation Models are label efficient in the downstream tasks and help us solve problems we are tackling in EO and remote sensing.

Abstract PDF HTML Upgrade to Chat

Citations (2)

View on Semantic Scholar

Summary

The paper introduces an evaluation benchmark that demonstrates up to 86% improvement in building density estimation and significant gains in label efficiency.
It compares supervised and self-supervised learning pathways, highlighting that foundation models require only 10-20% of labels relative to problem-specific models.
The study validates transformer and U-Net architectures with geo-location pre-training, proving their scalability for diverse Earth observation tasks.

Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI

Introduction

Foundation Models offer a significant advantage for applications requiring joint solutions to multiple problems, particularly in the domain of Earth Observation (EO) and geospatial AI. This paper investigates the efficacy of Foundation Models for EO tasks such as land cover classification, crop type mapping, flood segmentation, and building density estimation. It posits that Foundation Models outperform problem-specific models when dealing with limited labeled data, emphasizing the importance of label efficiency, which is crucial in EO due to the dynamic nature of Earth and the cost-intensive labeling processes associated with satellite data.

Joint Problem Solving with Prescribed High Accuracy

The paper argues for the deployment of Foundation Models in situations where a prescribed high performance accuracy (e.g., 95%) is required across various tasks, typically numbering around ten. Two methodological pathways are outlined: supervised learning using extensive labeled datasets (alternative A), and self-supervised learning followed by supervised learning using shared models (alternative B). The latter, leveraging Foundation Models, is highlighted as more efficient in terms of label usage, requiring significantly fewer labels (as low as 10-20% of those needed for problem-specific models). This efficiency is attributed to the capacity of Foundation Models to learn shared representations across tasks, thereby optimizing both cost and scalability.

Evaluation Benchmark for Foundation Models

The paper introduces a benchmark to evaluate Foundation Models in EO, addressing the challenge of standardizing comparisons across diverse models. This benchmark demonstrates that Foundation Models achieve substantial improvements in performance compared to problem-specific models given a limited number of labeled data. Key tasks such as semantic segmentation of land cover and building density estimation showcase improvements of up to 18.52% and 86% respectively with only 100 samples per region. The framework utilizes both Transformer-based architectures and U-Net-based models, employing geo-location classification as a pre-training strategy on large unlabelled satellite data.

Conclusion

The findings underscore the potential of Foundation Models to address complex, multi-task EO applications with greater label efficiency and lower costs compared to traditional problem-specific models. The proposed evaluation benchmark serves as a valuable tool for assessing the generalization capabilities of these models, reinforcing their applicability in scenarios where data labeling is constrained. This work contributes to the ongoing development of geospatial AI, paving the way for future advancements in Earth monitoring technologies that require robust and scalable modeling solutions. Future research may focus on refining these benchmarks and exploring additional applications of Foundation Models across varying geospatial tasks.