Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Scaling transformer neural networks for skillful and reliable medium-range weather forecasting (2312.03876v2)

Published 6 Dec 2023 in physics.ao-ph, cs.AI, and cs.LG

Abstract: Weather forecasting is a fundamental problem for anticipating and mitigating the impacts of climate change. Recently, data-driven approaches for weather forecasting based on deep learning have shown great promise, achieving accuracies that are competitive with operational systems. However, those methods often employ complex, customized architectures without sufficient ablation analysis, making it difficult to understand what truly contributes to their success. Here we introduce Stormer, a simple transformer model that achieves state-of-the-art performance on weather forecasting with minimal changes to the standard transformer backbone. We identify the key components of Stormer through careful empirical analyses, including weather-specific embedding, randomized dynamics forecast, and pressure-weighted loss. At the core of Stormer is a randomized forecasting objective that trains the model to forecast the weather dynamics over varying time intervals. During inference, this allows us to produce multiple forecasts for a target lead time and combine them to obtain better forecast accuracy. On WeatherBench 2, Stormer performs competitively at short to medium-range forecasts and outperforms current methods beyond 7 days, while requiring orders-of-magnitude less training data and compute. Additionally, we demonstrate Stormer's favorable scaling properties, showing consistent improvements in forecast accuracy with increases in model size and training tokens. Code and checkpoints are available at https://github.com/tung-nd/stormer.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Tung Nguyen (58 papers)
  2. Rohan Shah (8 papers)
  3. Hritik Bansal (38 papers)
  4. Troy Arcomano (7 papers)
  5. Sandeep Madireddy (33 papers)
  6. Romit Maulik (76 papers)
  7. Veerabhadra Kotamarthi (1 paper)
  8. Ian Foster (138 papers)
  9. Aditya Grover (82 papers)
Citations (42)

Summary

Scaling Transformer Neural Networks for Skillful and Reliable Medium-Range Weather Forecasting

This paper presents Stormer, a streamlined deep learning model developed for medium-range weather forecasting utilizing a transformer architecture. The model aims to achieve state-of-the-art performance with minimal modifications to the standard transformer structure. This paper addresses the critical challenges in traditional Numerical Weather Prediction (NWP) models, which include parameterization errors in small-scale physical processes and a lack of forecast accuracy improvement with increasing data due to computational limitations and reliance on expert-driven improvements.

Key Contributions

The authors identify and implement three primary components within Stormer:

  1. Weather-Specific Embedding: This component is designed to transform input weather data into a sequence of tokens by modeling atmospheric variable interactions, which is crucial for capturing the interdependencies among atmospheric variables.
  2. Randomized Dynamics Forecasting Objective: This feature enables Stormer to train on weather dynamics over varying time intervals. During inference, it allows the model to generate multiple forecasts for a defined lead time, which can be combined to enhance forecast accuracy.
  3. Pressure-Weighted Loss: This loss function weights variables based on pressure levels to approximate atmospheric density. This element helps prioritize near-surface variables that significantly impact weather prediction.

Research Findings

Stormer delivers competitive short to medium-range weather forecasts and surpasses existing models beyond seven days. The proposed model necessitates significantly less training data and computational resources than prevalent methods like Pangu-Weather and GraphCast. Notably, Stormer was trained on lower-resolution data and required orders-of-magnitude fewer GPU hours. Moreover, the model demonstrates favorable scaling attributes, displaying consistent forecast accuracy improvements with increased model size and training tokens.

Practical and Theoretical Implications

Practically, Stormer offers an efficient alternative to operational weather forecasting systems. Its ability to achieve high accuracy with reduced computational demand could catalyze broader adoption of deep learning models in meteorology, especially in regions with limited computational infrastructure. Theoretically, Stormer's simplified architecture facilitates better understanding and further innovation in transformer-based weather prediction models, serving as a foundation for the development of comprehensive climate models.

Speculation on Future Developments

The overriding implication of this paper hints at a future where scalable transformer models could redefine weather forecasting by offering accurate predictions with less data and computational overhead. Future research may focus on further optimizing the weather-specific embedding and investigating the full capabilities of randomized dynamics forecasting. Additionally, there is potential in extending this methodology for broader climate modeling applications, contributing to the growing intersection of AI and climate science.

In summary, the advancement represented by Stormer is a significant, albeit not revolutionary, step forward in the field of weather prediction. This work demonstrates the potential of modern deep learning architectures to meet the demands of weather forecasting, paving the way for future innovations that further integrate AI capabilities into environmental and atmospheric sciences.

Reddit Logo Streamline Icon: https://streamlinehq.com