Papers
Topics
Authors
Recent
2000 character limit reached

Data-Driven Weather Forecast Models

Updated 10 November 2025
  • Data-driven weather forecast models are machine learning systems that map past atmospheric states to future conditions using global reanalysis data.
  • They employ architectures like vision transformers, graph neural networks, and spectral operators alongside neural data assimilation to enhance accuracy and uncertainty quantification.
  • These models enable rapid, scalable forecasting from global to local scales, with innovations addressing extreme event prediction and operational integration.

Data-driven weather forecast models are statistical–machine learning systems that learn the mapping from previous atmospheric states (or observations) to future forecasts, typically from large global reanalysis datasets rather than explicit numerical integration of physical dynamical equations. In the past five years, these models—often based on vision transformers, graph neural networks, U-Nets, or spectral operators—have achieved skill that is comparable to, and on selected metrics surpasses, traditional numerical weather prediction (NWP) at global, regional, and local scales. Their performance depends on the quality of initial conditions, model architecture, and loss function, especially with respect to extremes and uncertainty quantification. This article provides a comprehensive summary including mathematical formulations, operational design, integration with data assimilation, methods for capturing extremes, benchmarking, computational considerations, and current limitations.

1. Mathematical Basis and Model Architecture

Data-driven weather models learn the transition operator

X^t+Δt=fθ(Xt)\hat{X}^{t+\Delta t} = f_{\theta}(X^t)

where XtRC×L×H×WX^t \in \mathbb{R}^{C \times L \times H \times W} is the multivariate atmospheric state on a grid (channels CC, vertical levels LL, latitude HH, longitude WW), and fθf_{\theta} is a deep neural network parameterized by θ\theta. Common architectures include 3D Earth-Specific Transformers (e.g., Pangu-Weather (Bi et al., 2022, Cheng et al., 2023)), Swin-Transformer U-Nets (Hirabayashi et al., 25 Mar 2025), fully-spectral operators (AFNO) (Cheon et al., 13 Feb 2024), and graph neural networks with global–regional mesh refinement (Nipen et al., 4 Sep 2024).

Key architectural choices:

  • Patch embedding along vertical and horizontal axes, maintaining cubic or multi-scale spatial representations.
  • Hierarchical down/up-sampling and positional encodings (e.g., Earth-Specific positional bias) to capture spatial inhomogeneity and anisotropy.
  • Self-attention or local–global mixing windows for long-range dependencies.
  • Autoregressive multi-step mapping for medium-range and sub-seasonal forecasting.

For kilometer-scale forecasting, novel cross-resolution transfer learning and regional adaptation modules can extend the trained system from coarse (0.25°) to fine (0.09°) grids without retraining the entire backbone (Han et al., 28 Jan 2024).

2. Data Assimilation and Initialization

Forecast quality is bounded by the fidelity of initial states. While most systems are trained on reanalyses such as ERA5, operational deployment can use initial conditions from current analyses (ECMWF IFS, NOAA GFS, CMA GRAPES, or regional DA systems) (Cheng et al., 2023, Choudhury et al., 17 Mar 2025). Compatibility experiments confirm that pretrained models generalize across analysis sources when input fields are properly mapped to their native grid/resolution.

End-to-end data-driven systems now integrate neural data assimilation, directly ingesting raw observations (satellite radiances, GNSS-RO, surface/radiosonde data), and producing analysis fields via deep networks with attention–fusion bottlenecks (Sun et al., 10 Aug 2024, Ni et al., 25 Aug 2025), or full observation-space embedding (McNally et al., 22 Jul 2024). Models such as Huracan (Ni et al., 25 Aug 2025) and Aardvark Weather (Vaughan et al., 30 Mar 2024) jointly optimize ensemble data assimilation and forecast modules, in some cases matching or exceeding the probabilistic skill of operational NWP ensembles.

The mathematical formulation for 4DVar assimilation with a neural forecast operator replaces classical adjoint gradients with auto-differentiation, streamlining both the analysis update and the forecast cycle (Xiao et al., 2023):

J(x0)=12(x0xb)TB1(x0xb)+12τ=0T1[H(xτ)yτ]T(Rτ+Qτ)1[H(xτ)yτ]J(x_0) = \frac{1}{2}(x_0 - x^b)^T B^{-1}(x_0 - x^b) + \frac{1}{2}\sum_{\tau=0}^{T-1} [H(x_\tau) - y_\tau]^T (R_\tau + Q_\tau)^{-1} [H(x_\tau) - y_\tau]

where HH is the observation operator and xτx_\tau generated by the neural model.

3. Loss Functions, Extremes, and Uncertainty Quantification

Deterministic models have traditionally minimized the mean squared error (MSE) across all locations and variables, weighted by latitude for area (e.g., w(ϕ)=cos(ϕ)w(\phi)=\cos(\phi)). However, symmetric losses such as MSE produce biased underestimation of rare/extreme events, due to the asymmetry in the Generalized Extreme Value (GEV) or Gumbel distribution of maxima.

To address this, ExtremeCast (Xu et al., 2 Feb 2024) introduces Exloss:

Exloss(x,y)=S(x,y)(xy)2\text{Exloss}(x, y) = |S(x, y) \cdot (x - y)|^2

with S(x,y)=s<<1S(x,y)=s_< <1 if yy in the tail (above y90y_{90} or below y10y_{10}) and xx under-predicts, otherwise S=1S=1. This reweights gradients so that tail predictions receive stronger correction, balancing the expected loss across under/over prediction.

Post-hoc ensemble schemes (ExEnsemble, ExBooster) introduce controlled noise to deterministic forecasts, rank-sorting perturbed outputs, and selecting the rank-preserved member. This broadens the dispersion of predicted extremes without retraining.

Probabilistic models use continuous ranked probability score (CRPS) for training and evaluation:

CRPS(F,y)=[F(x)1(xy)]2dx\mathrm{CRPS}(F, y) = \int_{-\infty}^{\infty} [F(x) - 1(x \ge y)]^2 dx

where F(x)F(x) is the empirical forecast CDF. Ensembles of networks or diffusion models (Hirabayashi et al., 25 Mar 2025, Ni et al., 25 Aug 2025) provide physically plausible uncertainty quantification and better calibration at long leads.

4. Training Protocols, Computational Resources, and Efficiency

The largest global models are pretrained on 40+ years of hourly reanalysis (ERA5, IMDAA), with additional fine-tuning on high-resolution or regional analyses (Bi et al., 2022, Choudhury et al., 17 Mar 2025). Training costs span hundreds of GPU-days on multi-node A100/TPU clusters; regional models or meta-models can be trained on CPUs with reduced complexity via dimension reduction, local attention, or split-domain methods (Skinner et al., 2020, Ueyama et al., 28 Jun 2024).

Inference is orders of magnitude faster than NWP. For example, Pangu-Weather produces a 240 h global forecast in ~4 s on a 24 GB GPU (Cheng et al., 2023), while Aardvark Weather completes 10-day global rollouts in minutes on a handful of GPUs (Vaughan et al., 30 Mar 2024). Stretched-grid GNN models efficiently target high-resolution domains (2.5 km) within global context (Nipen et al., 4 Sep 2024). Distributed variable aggregation and crop-based training sharply reduce parameter and memory costs, with only minor loss in skill (Ueyama et al., 28 Jun 2024).

Hierarchical autoregressive mapping allows dynamic selection of forecast interval per lead, reducing cumulative error and runtime, especially for regional systems (Choudhury et al., 17 Mar 2025).

5. Benchmarking, Verification, and Skill Metrics

Verification is framed in terms of latitude-weighted RMSE, anomaly correlation coefficient (ACC), bias, CRPS (for ensembles), spread–skill ratio, and event-based indices (SEEPS, SEDI, ETS). ERA5 acts as “ground truth” for most global models, with validation against operational analyses and station observations for practical relevance (Cheng et al., 2023, Bi et al., 2022, Sun et al., 10 Aug 2024, Hirabayashi et al., 25 Mar 2025).

WeatherBench 2 (Rasp et al., 2023) and the rise of data-driven models (Ben-Bouallegue et al., 2023) document head-to-head performance against IFS HRES and ensemble baselines:

Model RMSE (Z500, 5d lead) Relative to IFS HRES
IFS HRES 3.93×10³ m²/s² 0%
GraphCast 3.96×10³ m²/s² +0.8%
Pangu-Weather 4.02×10³ m²/s² +2.3%
ERA5 hindcast 4.01×10³ m²/s² +2.1%

Extremes and skill at high quantiles remain a principal differentiator. With Exloss and ExEnsemble (ExtremeCast (Xu et al., 2 Feb 2024)), global underestimation of tail events (RQE near zero) and SEDI for 99.5th percentile events (t2m: 0.71 vs GraphCast: 0.56, ECMWF-IFS: 0.61) are state-of-the-art.

Regional models such as the stretched-grid GNN (Nipen et al., 4 Sep 2024) yield RMSE and equitable threat scores exceeding regional NWP at short leads, though deterministic loss functions still under-represent extremes.

6. Limitations, Challenges, and Prospective Advances

Current limitations include:

  • Dependence on analysis quality: Users in data-sparse regions still require careful mapping or local enhancements for initial states (Cheng et al., 2023, Sun et al., 10 Aug 2024).
  • Extreme event underestimation: Remedies include Exloss, quantile regression, ensemble augmentation, and tailored post-processing (Xu et al., 2 Feb 2024, Nipen et al., 4 Sep 2024).
  • Physical constraints: Most models do not explicitly enforce conservation laws (mass, energy), risking dynamic inconsistency.
  • Uncertainty quantification: Probabilistic ensemble approaches are still in early adoption outside the very largest systems (Ni et al., 25 Aug 2025, Weyn et al., 22 Mar 2024).
  • Station-level and diagnostic prediction scaling: Modular two-stage approaches allow scalable addition of new diagnostic variables without backbone retraining (Mitra et al., 2023).
  • Scalability and compute: Regional split training and distributed variable representation cut resource demand, but optimal choices of split and aggregation remain an open research topic (Ueyama et al., 28 Jun 2024).

Future directions emphasize hybrid physics–ML architectures, integrated end-to-end DA, uncertainty-aware probabilistic forecasting, downscaling, and physically-constrained learning. Direct observation-space modeling circumvents traditional DA, enabling more flexible, rapid, and inclusive Earth system prediction (McNally et al., 22 Jul 2024).

7. Impact and Context in Operational Forecasting

Data-driven weather forecast models now rival NWP for global and medium-range skill, have fundamentally shifted the operational paradigm toward inference-based rapid updates, and enable:

  • Near-real-time, regionally customized forecasts at reduced computational cost.
  • Flexible deployment in resource-limited environments, with full support for bespoke diagnostic prediction and local adaptation (Vaughan et al., 30 Mar 2024).
  • Robust, scalable integration with satellite and in situ observations, with direct forecast initialization bypassing traditional DA bottlenecks (Sun et al., 10 Aug 2024, Ni et al., 25 Aug 2025).

Hybrid systems employing large-scale spectral nudging merge physics-based and ML-generated weather fields, leveraging strengths of both classes (Husain et al., 8 Jul 2024). ExtremeCast (Xu et al., 2 Feb 2024) and similar systems demonstrate quantitative correction of tail biases inherent to symmetric loss convention, setting benchmarks for risk-sensitive applications.

A plausible implication is that large-scale, modular, and observation-driven data-driven models will increasingly underpin operational forecasting, with machine learning augmenting or even supplanting classic NWP infrastructure for certain predictands and regions. Operational centers are advised to incorporate loss design, ensemble augmentation, and rigorous benchmarking against multi-source analysis in future implementations to ensure reliability at extremes and in rapidly evolving meteorological conditions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Data-Driven Weather Forecast Models.