Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Foundation Models for Generalist Geospatial Artificial Intelligence (2310.18660v2)

Published 28 Oct 2023 in cs.CV and cs.LG

Abstract: Significant progress in the development of highly adaptable and reusable AI models is expected to have a significant impact on Earth science and remote sensing. Foundation models are pre-trained on large unlabeled datasets through self-supervision, and then fine-tuned for various downstream tasks with small labeled datasets. This paper introduces a first-of-a-kind framework for the efficient pre-training and fine-tuning of foundational models on extensive geospatial data. We have utilized this framework to create Prithvi, a transformer-based geospatial foundational model pre-trained on more than 1TB of multispectral satellite imagery from the Harmonized Landsat-Sentinel 2 (HLS) dataset. Our study demonstrates the efficacy of our framework in successfully fine-tuning Prithvi to a range of Earth observation tasks that have not been tackled by previous work on foundation models involving multi-temporal cloud gap imputation, flood mapping, wildfire scar segmentation, and multi-temporal crop segmentation. Our experiments show that the pre-trained model accelerates the fine-tuning process compared to leveraging randomly initialized weights. In addition, pre-trained Prithvi compares well against the state-of-the-art, e.g., outperforming a conditional GAN model in multi-temporal cloud imputation by up to 5pp (or 5.7%) in the structural similarity index. Finally, due to the limited availability of labeled data in the field of Earth observation, we gradually reduce the quantity of available labeled data for refining the model to evaluate data efficiency and demonstrate that data can be decreased significantly without affecting the model's accuracy. The pre-trained 100 million parameter model and corresponding fine-tuning workflows have been released publicly as open source contributions to the global Earth sciences community through Hugging Face.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (33)
  1. Johannes Jakubik (24 papers)
  2. Sujit Roy (10 papers)
  3. C. E. Phillips (1 paper)
  4. Paolo Fraccaro (17 papers)
  5. Denys Godwin (3 papers)
  6. Bianca Zadrozny (17 papers)
  7. Daniela Szwarcman (14 papers)
  8. Carlos Gomes (7 papers)
  9. Gabby Nyirjesy (1 paper)
  10. Blair Edwards (4 papers)
  11. Daiki Kimura (20 papers)
  12. Naomi Simumba (4 papers)
  13. Linsong Chu (3 papers)
  14. S. Karthik Mukkavilli (10 papers)
  15. Devyani Lambhate (3 papers)
  16. Kamal Das (24 papers)
  17. Ranjini Bangalore (2 papers)
  18. Dario Oliveira (5 papers)
  19. Michal Muszynski (6 papers)
  20. Kumar Ankur (3 papers)
Citations (55)

Summary

Foundation models for generalist geospatial artificial intelligence (GeoAI) represent a significant advancement in the field of Earth science and remote sensing. These models, pre-trained on vast datasets through self-supervision, can be fine-tuned for various downstream geospatial tasks using smaller labeled datasets, making them highly adaptable and reusable.

The concept of foundation models in GeoAI is illustrated by the development of "Prithvi," a transformer-based geospatial model pre-trained on over 1TB of multispectral satellite imagery from the Harmonized Landsat-Sentinel 2 (HLS) dataset. This model demonstrates effectiveness across multiple Earth observation tasks, such as multi-temporal cloud gap imputation, flood mapping, wildfire scar segmentation, and multi-temporal crop segmentation. The pre-trained Prithvi model significantly accelerates the fine-tuning process and often performs better than state-of-the-art models, as evidenced by a 5.7% improvement over a conditional GAN model in multi-temporal cloud imputation (Jakubik et al., 2023 ).

However, the development of foundation models for GeoAI faces several challenges, notably the multimodality of geospatial data. While some existing LLMs perform well in text-based geospatial tasks, they underperform in tasks that require processing multiple data modalities, such as urban noise classification using street view images or remote sensing scene classification (Mai et al., 2023 ). To address these challenges, future research suggests developing multimodal foundation models capable of integrating various geospatial data types through geospatial alignments, enabling more comprehensive and accurate GeoAI systems.

Continual pretraining is another approach to enhance the performance of geospatial foundation models. By leveraging existing large-scale models such as ImageNet-22k, and augmenting them with domain-specific features through continual pretraining, researchers have achieved notable improvements across numerous geospatial tasks at minimal resource cost (Mendieta et al., 2023 ). This method represents a promising direction for building more efficient and powerful GeoAI systems.

In summary, foundation models for generalist GeoAI hold significant potential for advancing the capabilities of Earth observation and remote sensing tools. Ongoing research focuses on overcoming multimodal data challenges and enhancing model efficiency and accuracy through innovative pretraining techniques. The public release of these models and their fine-tuning workflows promises to foster further advancements and applications in the global Earth sciences community.

X Twitter Logo Streamline Icon: https://streamlinehq.com