ArchesWeather: An efficient AI weather forecasting model at 1.5° resolution (2405.14527v2)

Published 23 May 2024 in cs.LG and cs.AI

Abstract: One of the guiding principles for designing AI-based weather forecasting systems is to embed physical constraints as inductive priors in the neural network architecture. A popular prior is locality, where the atmospheric data is processed with local neural interactions, like 3D convolutions or 3D local attention windows as in Pangu-Weather. On the other hand, some works have shown great success in weather forecasting without this locality principle, at the cost of a much higher parameter count. In this paper, we show that the 3D local processing in Pangu-Weather is computationally sub-optimal. We design ArchesWeather, a transformer model that combines 2D attention with a column-wise attention-based feature interaction module, and demonstrate that this design improves forecasting skill. ArchesWeather is trained at 1.5{\deg} resolution and 24h lead time, with a training budget of a few GPU-days and a lower inference cost than competing methods. An ensemble of four of our models shows better RMSE scores than the IFS HRES and is competitive with the 1.4{\deg} 50-members NeuralGCM ensemble for one to three days ahead forecasting. Our code and models are publicly available at https://github.com/gcouairon/ArchesWeather.

Citations (1)

View on Semantic Scholar

Summary

The paper introduces the sc transformer model, which utilizes a novel hybrid attention mechanism and Cross-Level Attention to achieve computational efficiency in weather forecasting.
Benchmarking shows sc performs comparably to state-of-the-art models for variables up to 3 days lead time, delivering competitive accuracy at 1.5 at 1.5 ° resolution with significantly lower computational costs.
This research makes high-performance weather modeling more accessible by reducing resource requirements and contributes to the theoretical understanding of balancing physical priors with AI model efficiency.

Overview of "sc: An Efficient AI Weather Forecasting Model at 1.5º Resolution"

The paper "sc: An Efficient AI Weather Forecasting Model at 1.5º Resolution" presents a new AI-based model for weather forecasting, leveraging recent advancements in machine learning. The work primarily addresses the optimization of neural network architectures used in predicting atmospheric phenomena. The authors focus their efforts on proposing a transformer model, named "sc," that combines 2D attention with a novel column-wise attention mechanism for enhanced feature interaction. This design choice stems from the observation that existing models utilizing 3D local processing, such as Pangu-Weather, may not always afford computational efficiency.

Methodology

The sc model deviates from conventional locality-based inductive biases that the authors argue are computationally sub-optimal. Instead, the approach involves a hybrid attention mechanism that minimizes parameter and training cost without sacrificing predictive performance. The Cross-Level Attention (CLA) component specifically aims to enhance computational efficiency by globally interacting with vertical atmospheric features, thus avoiding redundant computations associated with localized interactions only.

Training was conducted using the ERA5 reanalysis dataset at a 1.5º resolution, with multiple model sizes proposed—S, M, and L—varying in the number of transformer layers from 49M to 164M parameters. The training process highlighted the advantages of addressing the subtle distribution shift within ERA5 data, particularly circa the year 2000, further refined by fine-tuning the models on more recent information.

Experimental Results

The model was benchmarked against industry standards like IFS HRES and other AI models including Pangu-Weather and the NeuralGCM ensemble. The findings indicated that the sc model and its ensemble variants are competitive, particularly in upper-air variables, often surpassing aforementioned models in RMSE metrics for lead times up to three days, with significantly lower training and inference costs. The results exhibit the potential of sc in delivering proficient weather forecasts while maintaining a leaner computational footprint.

Implications and Future Directions

This research offers implications in both practical and theoretical dimensions. Practically, sc provides an accessible alternative for training high-performance weather models, significantly reducing the resources and data storage requirements. This makes it amenable to academic and smaller-scale research settings. Theoretically, it refines the discourse on the necessity and trade-offs of embedding physical priors within AI models, paving the way for architectures that might optimize the efficacy-to-cost ratio more effectively.

Future research could investigate the scalability of such models to finer resolutions or develop methodologies to bridge forecasts with higher-resolution requirements, like regional forecasting and cyclone tracking. Another fruitful endeavor could involve exploring techniques to downscale outputs to finer resolutions while ensuring consistency with physical models, potentially employing diffusion models.

In sum, this paper contributes to the ongoing development of efficient AI methodologies in meteorology, advancing the hypothesis that non-local attention mechanisms can facilitate competitive weather predictions with optimized resource use.

PDF Markdown

Related Papers

Tweets

https://twitter.com/BlankaBalogh_/status/1808778670808428955

https://twitter.com/gcouairon/status/1808768219668230272

YouTube

Show All Videos