Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PhilEO Bench: Evaluating Geo-Spatial Foundation Models (2401.04464v2)

Published 9 Jan 2024 in cs.CV and cs.LG

Abstract: Massive amounts of unlabelled data are captured by Earth Observation (EO) satellites, with the Sentinel-2 constellation generating 1.6 TB of data daily. This makes Remote Sensing a data-rich domain well suited to Machine Learning (ML) solutions. However, a bottleneck in applying ML models to EO is the lack of annotated data as annotation is a labour-intensive and costly process. As a result, research in this domain has focused on Self-Supervised Learning and Foundation Model approaches. This paper addresses the need to evaluate different Foundation Models on a fair and uniform benchmark by introducing the PhilEO Bench, a novel evaluation framework for EO Foundation Models. The framework comprises of a testbed and a novel 400 GB Sentinel-2 dataset containing labels for three downstream tasks, building density estimation, road segmentation, and land cover classification. We present experiments using our framework evaluating different Foundation Models, including Prithvi and SatMAE, at multiple n-shots and convergence rates.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Yi Wang et al., “Self-supervised learning in remote sensing: A review,” IEEE Geosc & Rem Sen Mag, 2022.
  2. European Space Agency (ESA), “Sentinel-2 Operations,” Enabling and Support, Operations, http://www.esa.int/Enabling_Support/Operations/Sentinel-2_operations.
  3. Oscar Manas et al., “Seasonal contrast: Unsupervised pre-training from uncurated remote sensing data,” in Proc. ICCV, 2021.
  4. “Geography-aware self-supervised learning,” arxiv:2011.09980, 2021.
  5. Yi Wang et al., “SSL4EO-S12: A large-scale multi-modal, multi-temporal dataset for self-supervised learning in Earth Observation,” arxiv:2211.07044, 2022.
  6. “An agenda for multimodal Foundation Models for Earth Observation,” In Proc. IGARSS, 2023.
  7. “Foundation Models for Generalist Geospatial Artificial Intelligence,” arxiv:2310.18660, 2023.
  8. “SatMAE: Pre-training Transformers for temporal and multi-spectral satellite imagery,” arxiv:2207.08051, 2023.
  9. A. Dosovitskiy et al., “Discriminative unsupervised feature learning with exemplar CNNs,” Trans. PAMI, 2016.
  10. “Multi-task self-supervised visual learning,” in Proc. ICCV, 2017.
  11. “Embedding Earth: Self-supervised contrastive pre-training for dense land cover classification,” arxiv:2203.06041, 2022.
  12. Yang et al. Yu, “An unsupervised convolutional feature fusion network for deep representation of remote sensing images,” IEEE Geoscience Rem Sens Letters, 2018.
  13. Paul Berg et al., “Self-supervised learning for scene classification in remote sensing: Current state of the art and perspectives,” Remote Sensing, 2022.
  14. D. Tuia et al., “Artificial Intelligence to advance Earth observation: A perspective,” arxiv:2305.08413, 2023.
  15. Favyen Bastani et al., “Satlas: A large-scale, multi-task dataset for RS image understanding,” In ICCV, 2023.
  16. Matias Mendieta et al., “Towards geospatial foundation models via continual pretraining,” In Proc. ICCV, 2023.
  17. Qian Shi et al., “Globe230k: A benchmark dense-pixel annotation dataset for global land cover mapping,” J Rem Sen AAAS, 2023.
  18. Gencer Sumbul et al., “BigEarthNet-MM: A large scale multi-modal multi-label benchmark archive for RS image classification,” IEEE Geosc Rem Sens Mag, 2021.
  19. Xenia Ivashkovych et al., “CORSA deep Earth Observation semantic compression applied to flood detection,” ESA OBPDC workshop, 2022.
  20. J. Irvin et al., “USat: A unified self-supervised encoder for multi-sensor satellite,” arXiv:2312.02199, 2023.
  21. Tung Nguyen et al., “ClimaX: A foundation model for weather and climate,” arxiv:2301.10343, 2023.
  22. Kaiming He et al., “Masked autoencoders are scalable vision learners,” In Proc. CVPR, 2022.
  23. “Scale-MAE: A scale-aware masked autoencoder for multiscale geospatial representation learning,” arxiv:2212.14532, 2022.
  24. G. Tseng et al., “Lightweight, pre-trained Transformers for remote sensing timeseries,” arxiv:2304.14065, 2023.
  25. Utkarsh Mall et al., “Change-aware sampling and contrastive learning for satellite images,” in Proc. CVPR, 2023.
  26. A. Lacoste et al., “Geo-Bench: Toward Foundation Models for Earth monitoring,” arxiv:2306.03831, 2023.
  27. Wenyuan Li et al., “Geographical knowledge-driven representation learning for remote sensing images,” IEEE Trans Geoscience and Rem Sens, 60:1-16, 2022.
  28. “The trade-off between universality and label efficiency of representations from contrastive learning,” in Proc. ICLR, 2023.
  29. K. Klemmer et al., “SatCLIP: Global, general-purpose location embeddings,” arXiv:2311.17179, 2023.
  30. Isaac Corley et al., “Revisiting pre-trained remote sensing model benchmarks: Resizing and normalization matters,” arXiv:2305.13456, 2023.
  31. Danfeng Hong et al., “SpectralGPT: Spectral foundation model,” arXiv:2311.07113, 2023.
  32. Randall Balestriero et al., “A cookbook of self-supervised learning,” arXiv:2304.12210, 2023.
  33. O. Ronneberger et al., “U-Net: Convolutional networks for biomedical image segmentation,” In MICCAI, 2015.
  34. Ilya O Tolstikhin et al., “MLP-Mixer: An all-MLP architecture for vision,” In Proc. NeurIPS, 2021.
  35. A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for recognition at scale,” In ICLR, 2021.
  36. M. Kottek et al., “World map of the Köppen-Geiger climate classification,” Meteor Z, v. 15, p. 259-263, 2006.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Casper Fibaek (4 papers)
  2. Luke Camilleri (3 papers)
  3. Andreas Luyts (3 papers)
  4. Nikolaos Dionelis (16 papers)
  5. Bertrand Le Saux (59 papers)
Citations (8)

Summary

Evaluating Geo-Spatial Foundation Models via Phileo Bench

The paper presents Phileo Bench, a comprehensive framework designed for evaluating geo-spatial Foundation Models (FMs) in the context of Earth Observation (EO), primarily using data from the Sentinel-2 satellite constellation. This framework addresses significant challenges in the deployment of Machine Learning (ML) in remote sensing, where the availability of vast amounts of unlabeled EO data juxtaposes the labor-intensive process of acquiring labeled datasets for training models.

Dataset and Tasks

The Phileo Bench is centered around a novel dataset derived from Sentinel-2 imagery, encompassing approximately 400GB of data. This dataset provides a basis for evaluating FMs across multiple downstream tasks, specifically: building density estimation, road segmentation, and land cover classification. This dataset's particular strength lies in its geographical diversity, with data drawn from various global regions, ensuring the evaluation considers a wide array of landscapes and conditions.

Each of the downstream tasks in Phileo Bench can be approached as either a segmentation task or a classification task, underscoring the versatility of the data collection in facilitating thorough and varied model evaluations. The variety and global scope of the dataset allow researchers to critically examine model generalizability and adaptability.

Framework and Evaluation Methodology

Phileo Bench facilitates a fair comparison across different Foundation Models by standardizing the evaluation framework. It emphasizes two main components: a universal set of evaluation data and a consistent downstream task head, minimizing the impact of variable architectures or dataset characteristics.

The paper details the specific methodological approach in evaluating the models, employing techniques such as linear probing and fine-tuning to assess model performance adaptability. By evaluating on a fixed test set, the framework ensures that any performance discrepancies genuinely reflect model differences rather than extraneous factors. Additionally, the authors prominently incorporate both image-to-image and image classification downstream tasks, highlighting a multifaceted approach to model assessment.

Results and Insights

The empirical results within the Phileo Bench framework reveal crucial insights into the current capabilities and limitations of existing geo-spatial Foundation Models. Notably, the analysis reveals that simpler architectures, such as U-Nets, can outperform more sophisticated models like SatMAE and Prithvi in image-to-image downstream tasks. This is attributed to U-Nets' ability to retain low-level feature information due to their encoder-decoder structure, which significantly aids in reconstructing fine-grained details.

Conversely, Foundation Models predicated on Vision Transformer (ViT) architectures, tailored towards classification tasks, display limitations when applied to segmentation problems due to their inherent bottleneck, which inhibits the reconstruction of high-resolution output. This makes a compelling case for revisiting model architectures in the EO context to ensure they are fit for purpose across a wider range of task types.

Conclusion and Future Directions

The introduction of Phileo Bench marks a significant advancement in the evaluation of geo-spatial Foundation Models, offering a structured, scalable, and fair framework for model assessment in Earth Observation. The framework underscores the necessity for models to be adaptable and robust against varying data characteristics and task requirements. The paper advocates for a reintegration of simple yet effective architectures in EO applications, while also highlighting potential areas for future research, such as cross-resolution evaluation.

By providing this systematic approach to model evaluation, Phileo Bench bears implications for the enhancement of EO Foundation Models, promoting ongoing research and development fueled by consistent and objective benchmarking.

Given these findings, the paper opens a discourse on refining and developing new Foundation Models with a dual emphasis on efficiency and task-specific performance, potentially shaping the future landscape of remote sensing and Earth observation.