GraphCast: Global Weather Forecasting
- GraphCast is a machine learning–based graph neural network model that predicts global weather states using multi-scale geodesic meshes and historical ERA5 data.
- It employs an encoder–processor–decoder architecture with autoregressive forecasting to deliver 6-hour interval predictions for up to 15 days, achieving competitive RMSE and anomaly correlation scores.
- The model offers computational efficiency and scalability for global weather forecasting while exhibiting limitations in capturing extreme events and mesoscale variability.
GraphCast is a machine learning–based, graph neural network (GNN) model designed for global medium-range weather forecasting on a spatially and physically comprehensive basis. Developed by DeepMind, GraphCast combines an encoder–processor–decoder GNN architecture operating over multi-scale geodesic meshes to predict the evolution of a state vector comprising key atmospheric, surface, and hydrological variables. The model is trained directly on historical reanalysis data (e.g., ERA5), and is engineered to deliver forecasts at deterministic six-hour intervals, out to 10–15 days, with global spatial coverage at 0.25° (≈25–28 km) resolution and up to 37 vertical levels. GraphCast attains skill that matches or exceeds state-of-the-art deterministic numerical weather prediction (NWP) systems on a wide suite of verification targets, while orders of magnitude faster in inference runtime (Lam et al., 2022, Sudharsan et al., 20 Jun 2025, Gupta et al., 2 Sep 2025). However, it has well-characterized limitations including underestimation of extremes and mesoscale variance, under-dispersion in ensemble covariances, and reduced extrapolation capacity for record-breaking or out-of-distribution events.
1. Model Architecture and Training Methodology
GraphCast formalizes global atmospheric evolution as a sequence of autoregressive GNN operations on a multi-resolution icosahedral mesh over the sphere. Each node represents a latitude–longitude grid cell, and edges connect neighbors both locally and across scales to capture synoptic and mesoscale dynamical couplings (Lam et al., 2022, Sudharsan et al., 20 Jun 2025):
- Encode: Input fields (e.g., 2×[current, previous] analysis states) are projected to node and edge embeddings (dimension ≈512) on the mesh.
- Process: 16 unshared GNN layers propagate information through message passing, employing Interaction Networks as MLP-driven updates for both nodes and edges:
- Decode: The processed node features are mapped back to the latitude–longitude grid to produce increments for all prognostic fields.
- Autoregression: Forecasts are rolled out at 6 h intervals for 10–15 days by sequentially feeding predictions as new inputs.
GraphCast is trained to minimize a weighted, area-normalized mean squared error over all output fields, levels, and lead times, with variable- and level-dependent weights (Lam et al., 2022, Sudharsan et al., 20 Jun 2025). Training utilizes 30–45 years of ERA5 reanalysis data, often supplemented by recent operational model analyses to mitigate distribution shift. The optimizer is AdamW with standard hyperparameters, typically using mixed precision (bfloat16/float32) on large TPU (or GPU) clusters; typical model size is 36–37 million parameters (“GC-37L”, width=512, depth=16), though smaller and larger variants are supported (Yu et al., 26 Feb 2026).
2. Performance Evaluation: Global and Regional Forecast Skill
GraphCast consistently outperforms, or matches, leading operational deterministic NWP systems (e.g., ECMWF HRES) on RMSE, anomaly correlation, and skill scores across 1–10 day lead times, for more than 90% of verification targets spanning troposphere and surface (Lam et al., 2022, Sudharsan et al., 20 Jun 2025, Gupta et al., 2 Sep 2025, Vonich et al., 2024). Its computational efficiency enables rapid global 10-day forecasts in <60 s on TPU v4 or <5 min on a single NVIDIA H100 GPU (Sudharsan et al., 20 Jun 2025).
Key quantitative findings:
- 2 m temperature RMSE (South Asian monsoon, vs. IMD): 1.15 K (1 d), 1.60 K (5 d), 2.20 K (10 d); 10 m wind RMSE at 10 d: 1.3–1.4 m/s (Gupta et al., 2 Sep 2025).
- Surface precipitation: systematic wet bias (+0.4 mm/d), undercounts >50 mm/d extremes by ~30% (Gupta et al., 2 Sep 2025).
- Cyclone track errors: 210–250 km at 1–3 d, up to 900 km at 7 d leads over the Indian Ocean (Gupta et al., 2 Sep 2025).
- At 5-day lead, 500 hPa geopotential RMSE ≲50 m globally; anomaly correlation coefficient >0.90 at 5 d, ≈0.75 at 10 d (Sudharsan et al., 20 Jun 2025).
- In U.S. heat waves, GraphCast outperforms UFS GEFS and Pangu-Weather at leads up to 10 days, with RMSE increasing from ~1.5 °C (1 d) to ~3.5 °C (20 d), but with persistent cold bias during pre-onset and peak periods (Ennis et al., 29 Apr 2025).
- Over Brazil, regime-dependent skill is observed: near-universal advantage in tropical/wet regimes, but underperformance for 500 hPa geopotential in austral winter South due to phase and amplitude deficits in baroclinic systems (e.g., skill scores drop to 47.9% at day 5) (Rowell et al., 4 Jun 2026).
3. Physical Consistency, Dynamical and Spectral Behavior
GraphCast’s GNN representation yields physically realistic large-scale dynamics and coherent spatial structures:
- Mesoscale and synoptic kinetic energy spectra show robust agreement at coarse wavenumbers, but systematic underestimation (15–25%) at k≥50 (wavelength <300–400 km), leading to smoothing of fine-scale features and reduction in small-scale EKE (Gupta et al., 2 Sep 2025, Husain et al., 2024, Subich, 2024).
- In spectral space, effective resolution degrades over lead time. At 120 h, the transient-eddy amplitude ratio γ(ℓ) indicates smoothing below ~1000 km, but GraphCast outperforms operational NWP in large-scale coherence (e.g., 500-hPa geopotential ACC improved by ΔACC≈+0.05 at day 7 in Northern Hemisphere winter) (Husain et al., 2024).
- Mechanistic interpretability, via sparse autoencoders, reveals monosemantic features corresponding to tropical cyclones, atmospheric rivers, diurnal/seasonal cycles, and regional heating patterns. Intervention on specific “feature” activations can systematically modulate event intensity while maintaining hydrostatic, mass, and gradient-wind balance (MacMillan et al., 30 Dec 2025).
4. Extremes, Predictability, and Limitations
GraphCast has strong skill for “in-distribution” severe events (tropical cyclones, atmospheric rivers, heatwaves) at medium-range (1–10 d), but exhibits systematic deficiencies for record-breaking and out-of-distribution extremes:
- Underestimates the frequency and amplitude of unprecedented heat, cold, and wind extremes compared to numerical models (e.g., HRES); RMSE and bias for heat records are consistently 10–30% larger than HRES at all lead times (Zhang et al., 21 Aug 2025).
- For the April 2024 Dubai rainfall, GraphCast correctly predicted the broad amplitude (55 mm vs. 60 mm observed) and timing/location of the extreme, despite no comparable events in regional training data. This success is attributed to “translocation”—the transfer of learned dynamics from other global basins via a near-global effective receptive field—rather than true extrapolation. GraphCast does not exhibit the ability to extrapolate beyond the amplitude extremes seen in training, tending to underpredict even in-distribution rare events due to data imbalance and spectral bias (Sun et al., 15 May 2025).
- Correction and improvement for rare events require hybridizing loss functions to emphasize tail behavior, augmenting with rare-event data, and designing architectures that mitigate spectral bias (e.g., by integrating diffusion models or variable-specific heads) (Sun et al., 15 May 2025, Zhang et al., 21 Aug 2025).
GraphCast’s deterministic formulation is rapid but under-dispersive in ensemble settings; background ensemble covariances show suppressed spread, especially for secondary circulation (radial wind) during hurricane intensification, and flatter empirical orthogonal function spectra with reduced large-scale variance. Hybrid approaches (inflation/localization, physics constraints) are recommended for ensemble data assimilation contexts (Chen et al., 17 Mar 2026).
5. Hybridization, Regionalization, and Scaling
GraphCast serves effectively as a backbone for hybrid and transfer learning systems:
- Spectral nudging: Synoptic/planetary-scale temperature and wind fields predicted by GraphCast can be spectrally filtered and used to nudge dynamical NWP systems (e.g., GDPS-SN), improving large-scale accuracy and predictability without introducing excess smoothing at fine scales, and improving tropical cyclone track accuracy (Husain et al., 2024).
- Efficient fine-tuning: GraphCast-37L can be adapted to new analysis systems (e.g., ECCC GDPS) within ~40 GPU-days and 2.5 years of training data. Re-normalization and re-weighted loss stages yield improved skill, anomaly correlation, and reduced RMSE compared to both unmodified GraphCast and the operational regional model (Subich, 2024).
- Regionalization analyses (Brazil, South Asia) expose dynamical regime boundaries dictating when GraphCast’s smoothing is advantageous (tropics, extended range) vs. limiting (mid-latitude winter). “Tropicalization” via increased temporal resolution, specialized loss penalties, and subgrid physics-enhancement is an active area for further development (Rowell et al., 4 Jun 2026, Gupta et al., 2 Sep 2025).
6. Scaling Laws, Computational Efficiency, and Practical Considerations
GraphCast attains among the best parameter efficiency in the segment:
- Empirical scaling laws show validation loss decreases as and . GraphCast achieves competitive loss with 10× fewer parameters than competing transformer-based architectures (Yu et al., 26 Feb 2026).
- However, hardware utilization on dense GPUs is limited (0.02% of H100 peak), owing to bandwidth- and memory-bound graph-messaging operations. Under standard compute budgets, this restricts the degree to which width or depth can be increased to reach lower validation loss, and suggests further efficiency will require advances in sparse kernel implementations or architectural optimizations (Yu et al., 26 Feb 2026).
- Compute-optimal analysis indicates that wider (not deeper) GNNs and longer-duration training provide the most favorable trade-off under current hardware and data constraints. Further gains may result from variable-specific prediction heads and physically guided fusion layers (Yu et al., 26 Feb 2026).
7. Operational Readiness and Perspectives
GraphCast is operationally viable for rapid, high-skill global forecasts at synoptic scales, with strengths in large-scale dynamics, efficiency, and forecast turnaround (Gupta et al., 2 Sep 2025, Lam et al., 2022). Its main weaknesses—underestimation at mesoscale/extreme tails, under-dispersion in ensembles, and limited stratospheric resolution—justify hybrid strategies for high-impact and hyperlocal forecasting:
- Integrating GraphCast as a predictor in hybrid AI–physics systems (spectral nudging, bias correction, ensemble initialization) enhances overall performance while addressing model-specific limitations (Husain et al., 2024).
- Operational deployment for disaster warning or agricultural advisories in data-sparse, convectively active, or high-variability regimes should employ GraphCast as part of an ensemble or hybrid, with explicit coverage of uncertainty and corrected tails (Gupta et al., 2 Sep 2025, Zhang et al., 21 Aug 2025).
- Continuous research into interpretability, uncertainty quantification, and data augmentation for extremes is essential for trustworthy, next-generation, globally deployed GNN-based weather prediction.
Collectively, GraphCast defines a new benchmark in scalable, skillful, and computationally efficient global weather prediction by leveraging graph neural architectures and data-driven optimization. Its strengths and limitations, as comprehensively documented across recent literature, inform both current operational practice and the future trajectory of scientific and technical development in data-driven meteorological modeling.