Prithvi WxC: Foundation Model for Weather and Climate (2409.13598v1)

Published 20 Sep 2024 in cs.LG and physics.ao-ph

Abstract: Triggered by the realization that AI emulators can rival the performance of traditional numerical weather prediction models running on HPC systems, there is now an increasing number of large AI models that address use cases such as forecasting, downscaling, or nowcasting. While the parallel developments in the AI literature focus on foundation models -- models that can be effectively tuned to address multiple, different use cases -- the developments on the weather and climate side largely focus on single-use cases with particular emphasis on mid-range forecasting. We close this gap by introducing Prithvi WxC, a 2.3 billion parameter foundation model developed using 160 variables from the Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2). Prithvi WxC employs an encoder-decoder-based architecture, incorporating concepts from various recent transformer models to effectively capture both regional and global dependencies in the input data. The model has been designed to accommodate large token counts to model weather phenomena in different topologies at fine resolutions. Furthermore, it is trained with a mixed objective that combines the paradigms of masked reconstruction with forecasting. We test the model on a set of challenging downstream tasks namely: Autoregressive rollout forecasting, Downscaling, Gravity wave flux parameterization, and Extreme events estimation. The pretrained model with 2.3 billion parameters, along with the associated fine-tuning workflows, has been publicly released as an open-source contribution via Hugging Face.

Authors (29)

Johannes Schmude (17 papers)
Sujit Roy (10 papers)
Will Trojak (25 papers)
Johannes Jakubik (24 papers)
Daniel Salles Civitarese (5 papers)
Shraddha Singh (16 papers)
Julian Kuehnert (5 papers)
Kumar Ankur (3 papers)
Aman Gupta (33 papers)
Christopher E Phillips (1 paper)
Romeo Kienzler (7 papers)
Daniela Szwarcman (14 papers)
Vishal Gaur (7 papers)
Rajat Shinde (5 papers)
Rohit Lal (14 papers)
Arlindo Da Silva (1 paper)
Jorge Luis Guevara Diaz (1 paper)
Anne Jones (6 papers)
Simon Pfreundschuh (2 papers)
Amy Lin (7 papers)

Citations (4)

View on Semantic Scholar

Summary

The paper introduces a foundation model that leverages a 2.3 billion parameter transformer and MERRA-2 data for masked reconstruction and forecasting.
It demonstrates robust zero-shot performance in short-term predictions (6-12 hours) and outperforms baselines in regional downscaling tasks.
The model effectively captures sub-grid atmospheric processes, offering novel insights for improving climate model parameterizations.

Overview of "Prithvi WxC: Foundation Model for Weather and Climate"

"Prithvi WxC: Foundation Model for Weather and Climate" presents a state-of-the-art AI model designed to bridge the gap between task-specific weather prediction models and the conceptual foundation models in AI. The Prithvi WxC model encompasses 2.3 billion parameters and capitalizes on an advanced encoder-decoder architecture, leveraging the Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2) dataset, which encompasses 160 atmospheric variables.

Model Architecture and Training

Prithvi WxC employs a transformer-based architecture incorporating innovations from recent transformer models, such as Hiera and MaxViT. This architecture is specifically tuned to handle both regional and global dependencies, significantly enhancing the model's ability to manage large token counts and diverse topologies at fine-grain resolutions. The model integrates both local and global attention mechanisms, with attention alternating between window-specific tokens and cross-window tokens. The attention pattern and architectural details ensure scalability and flexibility in operations across varied spatial contexts.

The model is pre-trained on MERRA-2 data with a mixed objective combining masked reconstruction and forecasting. This involves the prediction of atmospheric states at different future time steps, coupled with masked data reconstruction, enabling the model to process sparse observational data effectively. During pre-training, specific normalization constraints are applied to ensure numerical stability due to the wide value ranges in variables.

Evaluation and Performance

Zero-Shot Validation

Zero-shot validation reveals that Prithvi WxC excels in both masked reconstruction and forecasting tasks. The model demonstrates robust performance in reconstructing atmospheric states with as little as 5% of the original data effectively. Its zero-shot forecasting capabilities are competitive with existing AI forecast emulators, showing particular strength at short lead times (6 and 12 hours). However, performance tapers off after 66 hours compared to other models like Pangu, indicating potential areas for further fine-tuning and optimization.

Specific Downstream Tasks

Downscaling
- MERRA-2: Prithvi WxC is fine-tuned to upscale coarsened MERRA-2 data for 2m surface temperature, achieving a significant improvement over interpolation baselines by more than a factor of four.
- CORDEX: The model is also adapted to processing CORDEX data, with specific architectural adjustments for a regional context. Here, it achieves competitive spatial and temporal RMSE values, demonstrating superior downscaling capabilities for high-resolution climate data.
Climate Model Parameterization: Gravity Wave Flux
- Leveraging the latent space knowledge from MERRA-2 training, Prithvi WxC is fine-tuned for predicting gravity wave momentum fluxes using ERA5 data. This fine-tuning shows that the pretrained model captures sub-grid atmospheric processes effectively, evident from the high correlation between the model's predictions and observed data. This task's success underscores the model's potential for improving physical parameterizations in climate models, enhancing the representation of seasonal transitions and predictability on interannual timescales.

Practical and Theoretical Implications

Prithvi WxC offers significant practical and theoretical implications. Practically, it opens avenues for deploying AI-driven weather and climate models that are not only accurate but also computationally efficient, enabling real-time predictions and long-term climate studies. Theoretically, the model introduces novel approaches to handling large-scale atmospheric data, particularly through the effective use of masked reconstruction and mixed-objective training strategies.

Future Directions

Future research can further explore the integration of additional convolutional or neural operator layers for enhanced information flow, rigorous rollout tuning for extended forecasting accuracy, and the inclusion of diverse datasets for broader generalizability. The model's success in fine-tuning prompts consideration of embedding these foundation models within operational climate and weather prediction systems, potentially leading to a paradigm shift in how atmospheric sciences harness AI technologies.

Overall, Prithvi WxC stands as a testament to the evolving interplay between AI and atmospheric sciences, offering a robust framework for advancing predictive capabilities and improving environmental resilience.

PDF Markdown

Related Papers

Tweets

https://twitter.com/TheTuringPost/status/1841617593821691943

https://twitter.com/papers_anon/status/1838045249022935306

https://twitter.com/fly51fly/status/1838336045274054802

https://twitter.com/sksq96/status/1838687702012891341

https://twitter.com/gm8xx8/status/1838052981083705709

https://twitter.com/MachMindMusings/status/1838971138791055740

YouTube

Show All Videos