Prithvi Model: Geospatial Foundation
- Prithvi Model is a geospatial foundation model that uses transformer-based architectures with self-supervised pre-training on multispectral, multi-temporal datasets.
 - It employs innovative 3D patch embedding and positional encoding to capture spatial and temporal context, achieving remarkable performance on tasks like flood mapping and cloud gap imputation.
 - Pre-trained on vast remote sensing archives, Prithvi enables efficient transfer learning and domain adaptation across diverse applications including disaster response, agriculture, and climate modeling.
 
The Prithvi Model is a geospatial foundation model architecture designed by IBM and NASA for Earth Observation (EO) and climate science. It is built on transformer principles, leveraging self-supervised pre-training on massive multispectral and multi-temporal datasets to support a diverse array of downstream geospatial tasks. Prithvi and its successors exemplify the recent convergence between foundation model engineering and geospatial AI, offering highly adaptable, reusable representation learning that substantially improves data efficiency, transferability, and generalization properties for EO challenges in mapping, retrieval, prediction, and environmental modeling.
1. Architectural Principles and Innovations
Prithvi is based on a temporal Vision Transformer (ViT) backbone, with several advances tailored for EO imagery:
- 3D Patch Embedding: Inputs are multi-temporal, multi-spectral image cubes (C, T, H, W), divided into non-overlapping 3D tubelets (typically 1×16×16 or 1×2×2 for ocean color), encoding spatial and temporal context within each token (Jakubik et al., 2023, Szwarcman et al., 3 Dec 2024, Dawson et al., 25 Sep 2025).
 - 3D Positional Encoding: Sine/cosine positional embeddings are generated for height, width, and time, combined into a 3D positional bias; temporal and location metadata (latitude, longitude, date) are separately projected and incorporated into the token embeddings via learned weighting factors (Szwarcman et al., 3 Dec 2024).
 - Masked Autoencoder (MAE) Training: The encoder receives only visible patches, and the decoder reconstructs masked patches, optimizing mean squared error (MSE) loss over masked regions:
 
(Jakubik et al., 2023, Szwarcman et al., 3 Dec 2024, Li et al., 2023)
- Flexible Input Bandwidth: Initial models require six bands (RGB, NIR, SWIR1, SWIR2), but adaptations allow handling of three-band or nonstandard inputs via patch embedding redesign or channel duplication (Hsu et al., 31 Aug 2024, Dawson et al., 25 Sep 2025).
 
The latest Prithvi-EO-2.0 models scale to 300M/600M parameters and explicitly model spatiotemporal metadata for global EO transferability (Szwarcman et al., 3 Dec 2024), while Prithvi WxC extends this paradigm to weather and climate modeling with a 2.3B parameter encoder-decoder that alternates local and global attention across token windows (Schmude et al., 20 Sep 2024).
2. Pre-training Data and Methodology
Prithvi models are pre-trained on large remote sensing archives to capture the broadest possible spatial and temporal ground truth:
- Data Sources: Harmonized US and global Landsat-Sentinel-2 (HLS), Sentinel-3 OLCI (for ocean color), and MERRA-2 reanalysis (for WxC), with up to 160 variables in climate settings (Szwarcman et al., 3 Dec 2024, Schmude et al., 20 Sep 2024, Dawson et al., 25 Sep 2025).
 - Dataset Composition: Millions of temporally stratified chips, random time intervals, with strict cloud/missing-data filtering and stratified sampling for representative land cover and climatic diversity (Szwarcman et al., 3 Dec 2024).
 - Training Protocol: Large fractions of input patches (75%) are masked during each training step; global and temporal diversity are ensured via random cropping, augmentation, and dropout of metadata fields (Jakubik et al., 2023, Szwarcman et al., 3 Dec 2024).
 - Optimization: AdamW with batch sizes up to 3,840, cosine scheduler with linear warmup and weight decay, and distributed GPU training (up to 240 GPUs for 600M models) (Szwarcman et al., 3 Dec 2024).
 
This process yields high-capacity representation of global land, ocean, and atmospheric states, supporting both broad generalization and rapid fine-tuning.
3. Downstream Tasks and Performance
Prithvi and its derivatives have been fine-tuned on an extensive suite of downstream EO and climate tasks:
| Task | Key Performance Metric(s) | Notable Results | Reference | 
|---|---|---|---|
| Flood Inundation Mapping | mIoU, mAcc | Prithvi achieves 89.59% mIoU and 94.35% mAcc on test, 86.02%/90.38% mIoU/acc on unseen Bolivia | (Li et al., 2023) | 
| Multi-Temporal Cloud Gap Imputation | SSIM, MAE | SSIM > 0.9, up to 5% improvement over CGAN baseline, MAE down to 0.020 | (Jakubik et al., 2023, Godwin et al., 30 Apr 2024, Sosa et al., 27 Sep 2024) | 
| Wildfire Scar Segmentation | mIoU, F1 score | Pre-trained Prithvi improves IoU and F1 over randomly initialized encoders | (Jakubik et al., 2023) | 
| Building Density Estimation | MSE | Prithvi yields lowest MSE in n-shot transfer; U-Net retains finer details | (Fibaek et al., 9 Jan 2024) | 
| Crop Segmentation | mIoU, F1 score | Prithvi competitive, but U-Net and RFaug sometimes outperform; depends on texture importance | (Xie et al., 17 Apr 2024, Sosa et al., 27 Sep 2024) | 
| Remote Sensing Retrieval | mAP@20 | Prithvi achieves 97.62% mAP (BigEarthNet-43), outperforming RGB models | (Blumenstiel et al., 4 Mar 2024) | 
| Locust Breeding Ground Prediction | Accuracy, F1, ROC-AUC | Accuracy 83.03%, F1 81.53%, ROC-AUC 87.69%—multi-spectral EO alone sufficient | (Yusuf et al., 11 Mar 2024) | 
| Marine Chlorophyll & Prod. | RMSE, SSIM | SSIM improvement for large-scale inference; 11.8% reduction in RMSE for primary production | (Dawson et al., 25 Sep 2025) | 
| Gravity Wave Parameterization | Hellinger distance | Prithvi WxC fine-tuned model: 0.06 vs baseline 0.11 | (Gupta et al., 4 Sep 2025) | 
| Autoregressive Forecasting (WxC) | RMSE, Track Error | Superior short-term (6–12 hr) forecast skill; e.g. hurricane track error 63.9 km | (Schmude et al., 20 Sep 2024) | 
For segmentation and classification tasks, performance depends on spectral/temporal vs. texture feature importance. In pixel-level crop and flood mapping, traditional ML methods (RF, XGB) and U-Net architectures occasionally outperform Prithvi, especially when labels can be predicted from pixel spectra (Xie et al., 17 Apr 2024, Sosa et al., 27 Sep 2024). Prithvi excels when pre-training and fine-tuning objectives align (e.g., imputation tasks) or when spatial context is critical.
4. Generalization, Transferability, and Data Efficiency
Prithvi’s generalization and transferability are demonstrated across several axes:
- Unseen Geography: Prithvi achieves top mIoU and mAcc in test regions unrepresented in training (e.g., Bolivia) due to large-scale pre-training diversity (Li et al., 2023).
 - Data Efficiency: Data ablation experiments show little drop in accuracy when labeled data is reduced by 80–90%, supporting few-shot/zero-shot learning (Jakubik et al., 2023, Szwarcman et al., 3 Dec 2024).
 - Cross-Modal Reasoning: In the ZeroFlood framework, the “Thinking-in-Modality” (TiM) mechanism augments unimodal input with learned auxiliary tokens, bridging missing modalities and improving flood susceptibility mapping even when only Sentinel-2 is available (Kim et al., 27 Oct 2025).
 
5. Domain Adaptation and Pipeline Enhancements
Multiple studies detail technical improvements for Prithvi’s domain adaptability:
- Band Adaptation: Retrained patch embedding allows Prithvi to process inputs with fewer bands by reinitializing convolutional kernels and projecting RGB images without losing spatial detail (Hsu et al., 31 Aug 2024).
 - Multi-Scale Feature Generation: Supplementary networks inspired by FPN are appended to the ViT backbone, facilitating detection/segmentation across different object scales and improving mAP50 (Hsu et al., 31 Aug 2024).
 - End-to-End Fine-Tuning: Integrating the pre-trained encoder with Mask R-CNN-style heads (including RPN, RoIAlign) and multi-scale modules, followed by full end-to-end optimization (Hsu et al., 31 Aug 2024).
 
Limitations remain in computational efficiency (slower than ResNet50/MViTv2) and incomplete pre-training of all pipeline components. Avoiding geographic data leakage and benchmarking with standardized protocols are ongoing concerns.
6. Model Composition, Distillation, and Open Science
Recent research demonstrates that feature-level ensembling—combining Prithvi with other models (e.g., Hiera)—can match or exceed the performance of larger monolithic models with less resource expenditure. Knowledge distillation from ensembled representations into smaller deployable models is identified as an efficient avenue for EO applications (Chuc, 25 Jun 2025).
Prithvi, along with its workflows and fine-tuning recipes, is released as open source (Hugging Face, IBM terratorch, GitHub), and is cited as an exemplar of Trusted Open Science. Early involvement of subject matter experts (SMEs) is shown to improve model customization and benchmarking (Szwarcman et al., 3 Dec 2024).
7. Applications Across Domains
Prithvi models have been applied and validated in a breadth of operational and scientific settings:
- Disaster Response: Flood mapping, burn scar segmentation, and landslide detection.
 - Agriculture and Ecosystem Monitoring: Crop type mapping, above-ground biomass estimation, GPP estimation, and locust breeding ground prediction.
 - Ocean Science: Chlorophyll-a quantification and primary production mapping using Sentinel-3-OLCI data (Dawson et al., 25 Sep 2025).
 - Climate and Weather: Foundation model–based emulation for forecasting, downscaling, gravity wave parameterization, and extreme weather event prediction (WxC) (Schmude et al., 20 Sep 2024, Gupta et al., 4 Sep 2025).
 - Astrobiology: Facilitates biosignature detection, mission instrument design, and literature mining for new mission development (Felton et al., 8 Oct 2025).
 
References to Major Model Versions and Evaluations
- Prithvi-EO-1.0: Regional US pretraining, 100M parameters (Jakubik et al., 2023).
 - Prithvi-EO-2.0: 300M/600M global model with temporal-location embeddings and GEO-Bench validations (Szwarcman et al., 3 Dec 2024).
 - Prithvi WxC: 2.3B parameter model for weather and climate, masking and forecasting hybrid loss, pretrained on 160 MERRA-2 variables (Schmude et al., 20 Sep 2024).
 - PhilEO Bench, ZeroFlood, and ocean color adaptations evaluate Prithvi’s domain transfer and compositional potential (Fibaek et al., 9 Jan 2024, Kim et al., 27 Oct 2025, Dawson et al., 25 Sep 2025).
 
Conclusion
Prithvi represents a scalable, adaptable, and trusted family of geospatial foundation models built for EO and climate research. Its transformer-based MAE architecture, comprehensive pretraining, and extensible pipeline design support robust representation learning. Optimizing domain adaptation, fine-tuning schemes, and enabling open access and SME-guided customization are current priorities. Ensemble approaches and knowledge distillation provide promising directions for resource-efficient deployment. Prithvi’s versatility and data efficiency underpin its adoption in disaster response, agricultural monitoring, ocean science, climate physics, and astrobiology, establishing a benchmark for generalizable geospatial AI.