PhilEO Bench: Evaluating Geo-Spatial Foundation Models (2401.04464v2)

Published 9 Jan 2024 in cs.CV and cs.LG

Abstract: Massive amounts of unlabelled data are captured by Earth Observation (EO) satellites, with the Sentinel-2 constellation generating 1.6 TB of data daily. This makes Remote Sensing a data-rich domain well suited to Machine Learning (ML) solutions. However, a bottleneck in applying ML models to EO is the lack of annotated data as annotation is a labour-intensive and costly process. As a result, research in this domain has focused on Self-Supervised Learning and Foundation Model approaches. This paper addresses the need to evaluate different Foundation Models on a fair and uniform benchmark by introducing the PhilEO Bench, a novel evaluation framework for EO Foundation Models. The framework comprises of a testbed and a novel 400 GB Sentinel-2 dataset containing labels for three downstream tasks, building density estimation, road segmentation, and land cover classification. We present experiments using our framework evaluating different Foundation Models, including Prithvi and SatMAE, at multiple n-shots and convergence rates.

References (36)

Authors (5)

Casper Fibaek (4 papers)
Luke Camilleri (3 papers)
Andreas Luyts (3 papers)
Nikolaos Dionelis (16 papers)
Bertrand Le Saux (59 papers)

Citations (8)

View on Semantic Scholar

Summary

Evaluating Geo-Spatial Foundation Models via Phileo Bench

The paper presents Phileo Bench, a comprehensive framework designed for evaluating geo-spatial Foundation Models (FMs) in the context of Earth Observation (EO), primarily using data from the Sentinel-2 satellite constellation. This framework addresses significant challenges in the deployment of Machine Learning (ML) in remote sensing, where the availability of vast amounts of unlabeled EO data juxtaposes the labor-intensive process of acquiring labeled datasets for training models.

Dataset and Tasks

The Phileo Bench is centered around a novel dataset derived from Sentinel-2 imagery, encompassing approximately 400GB of data. This dataset provides a basis for evaluating FMs across multiple downstream tasks, specifically: building density estimation, road segmentation, and land cover classification. This dataset's particular strength lies in its geographical diversity, with data drawn from various global regions, ensuring the evaluation considers a wide array of landscapes and conditions.

Each of the downstream tasks in Phileo Bench can be approached as either a segmentation task or a classification task, underscoring the versatility of the data collection in facilitating thorough and varied model evaluations. The variety and global scope of the dataset allow researchers to critically examine model generalizability and adaptability.

Framework and Evaluation Methodology

Phileo Bench facilitates a fair comparison across different Foundation Models by standardizing the evaluation framework. It emphasizes two main components: a universal set of evaluation data and a consistent downstream task head, minimizing the impact of variable architectures or dataset characteristics.

The paper details the specific methodological approach in evaluating the models, employing techniques such as linear probing and fine-tuning to assess model performance adaptability. By evaluating on a fixed test set, the framework ensures that any performance discrepancies genuinely reflect model differences rather than extraneous factors. Additionally, the authors prominently incorporate both image-to-image and image classification downstream tasks, highlighting a multifaceted approach to model assessment.

Results and Insights

The empirical results within the Phileo Bench framework reveal crucial insights into the current capabilities and limitations of existing geo-spatial Foundation Models. Notably, the analysis reveals that simpler architectures, such as U-Nets, can outperform more sophisticated models like SatMAE and Prithvi in image-to-image downstream tasks. This is attributed to U-Nets' ability to retain low-level feature information due to their encoder-decoder structure, which significantly aids in reconstructing fine-grained details.

Conversely, Foundation Models predicated on Vision Transformer (ViT) architectures, tailored towards classification tasks, display limitations when applied to segmentation problems due to their inherent bottleneck, which inhibits the reconstruction of high-resolution output. This makes a compelling case for revisiting model architectures in the EO context to ensure they are fit for purpose across a wider range of task types.

Conclusion and Future Directions

The introduction of Phileo Bench marks a significant advancement in the evaluation of geo-spatial Foundation Models, offering a structured, scalable, and fair framework for model assessment in Earth Observation. The framework underscores the necessity for models to be adaptable and robust against varying data characteristics and task requirements. The paper advocates for a reintegration of simple yet effective architectures in EO applications, while also highlighting potential areas for future research, such as cross-resolution evaluation.

By providing this systematic approach to model evaluation, Phileo Bench bears implications for the enhancement of EO Foundation Models, promoting ongoing research and development fueled by consistent and objective benchmarking.

Given these findings, the paper opens a discourse on refining and developing new Foundation Models with a dual emphasis on efficiency and task-specific performance, potentially shaping the future landscape of remote sensing and Earth observation.

PDF Markdown

Related Papers

Tweets

https://twitter.com/blesa_ux/status/1763166363714048489

https://twitter.com/valeriomarsocci/status/1769733720284524654

https://twitter.com/momiji_fullmoon/status/1778789122142077370