FastNet v1.0: GNN Weather Prediction

Updated 24 September 2025

FastNet v1.0 is a data-driven global weather prediction system that employs a graph neural network architecture with a multi-level icosahedral mesh for resolution-independent, deterministic forecasting.
Its encode–process–decode framework integrates innovative loss functions, including modified spherical harmonic and gradient terms, to enhance physical realism and forecast accuracy.
Benchmark evaluations show FastNet surpasses traditional numerical models and state-of-the-art MLWP systems, offering competitive skill and operational readiness with reduced computational cost.

FastNet Version 1.0 is a data-driven global weather prediction system developed on a graph neural network (GNN) architecture, jointly by the Alan Turing Institute and the Met Office. It uses an encode–process–decode framework and a multi-level icosahedral mesh to deliver deterministic global forecasts out to ten days. FastNet is designed to be resolution-independent, capable of operating at both 1° and 0.25° horizontal scales, and is trained and evaluated on ERA5 reanalysis data. The architecture and training regimen incorporate innovations in mesh design, loss function engineering, and fine-tuning strategies, contributing to improved physical realism and operational performance when compared to traditional numerical weather prediction (NWP) models and other state-of-the-art machine learning weather prediction (MLWP) systems.

1. Architectural Framework and Mesh Construction

FastNet follows a modular encode–process–decode paradigm structured as follows:

Encoder: Maps gridded atmospheric states to a latent space on the icosahedral mesh. The mesh is constructed by recursive subdivision of an icosahedron, yielding uniform node distribution on the sphere and mitigating polar clustering effects seen in latitude–longitude grids. Grid-to-mesh connections are established via k-nearest neighbors (KNN) or geodesic radius-based methods. For the O96 (∼1°) configuration, KNN is standard, whereas the N320 (0.25°) resolution can benefit from radius-based connectivity for marginal RMSE gain.
Processor (Multi-level Mesh GNN): The processor operates on a multi-level mesh. This mesh aggregates edges from all refinement layers, supporting both short-range connections (for local meteorological phenomena) and long-range edges (for teleconnections and planetary-scale dynamics). The processor comprises 16 message-passing layers, each with 768 latent dimensions per mesh node, where learnable parameters are unique to each message-passing step, enabling hierarchical feature learning across scales.
Decoder: Transfers the processed mesh latent states back to the output grid. The decoder uses a KNN graph (each grid point connects to the three closest mesh nodes), followed by an interaction network and multi-layer perceptron (MLP). The output is residual, predicting the increment to the atmospheric state over each 6-hour step, facilitating easier learning of temporal evolution.

This architecture enables the capture of both fine and coarse spatial patterns essential for medium-range NWP.

2. Data Sources and Training Workflow

FastNet version 1.0 is trained on ECMWF ERA5 reanalysis data, comprising up to 85 variables per grid point (spanning forcing, atmospheric levels, and surface fields). Two grid configurations are used in experiments: O96 (∼1° resolution) and N320 (∼0.25° resolution), each corresponding to different mesh granularities.

Training Regimen:

Pre-Training: Initial training targets a single 6-hour time step using a weighted mean squared error (MSE) loss, given by

$\mathcal{L}_{\text{MSE}} = \frac{1}{|D_{\mathrm{batch}}|} \sum_{d_0\in D_{\mathrm{batch}}} \sum_{\tau=1}^{T_{\mathrm{train}}} \sum_{j\in J} s_j w_j \left( \hat{x}_j^{d_0+\tau} - x_j^{d_0+\tau} \right)^2$

where $s_j$ is a per-variable-level inverse variance weight, $w_j$ is a pressure-level or fixed weight, $\hat{x}_j^{d_0+\tau}$ and $x_j^{d_0+\tau}$ are ML forecast and ground truth, respectively.

Autoregressive Fine-Tuning: Post pre-training, multi-step rollouts are introduced to improve temporal stability. The model is trained to predict across increasing numbers of autoregressive steps ( $T_{\mathrm{train}}$ ), thus learning to mitigate error accumulation typical in recursive forecasts. In practice, for O96 this involves gradually increasing $T_{\mathrm{train}}$ from 2 to 12 steps, with analogous procedures for N320 (sometimes with modified learning rates and parallelisation).

This two-phase strategy enables FastNet to produce stable and accurate forecasts up to several days in advance.

3. Loss Function Innovations and Physical Consistency

FastNet investigates alternative loss designs to improve physical realism beyond statistical accuracy:

Modified Spherical Harmonic (MSH) Loss: Penalizes spectral amplitude errors in Fourier space, minimizing loss of small-scale power ("blurring") at extended lead times.
Horizontal Gradient Terms: Adds zonal and meridional derivatives to the MSE, enforcing meteorological smoothness and spatial structure, and suppressing artefacts such as “honeycomb” patterns arising from the mesh topology.
Alternative Wind Representation: Rather than predicting $(u,v)$ wind components directly, the model is guided to output wind speed (i.e., $\sqrt{u^{2}+v^{2}}$ ) along with directional unit vectors, reducing directional bias and enhancing extreme-event representation.

Benchmarking shows that MSH and gradient losses alone may marginally increase RMSE but, when combined, maintain nearly identical RMSE to vanilla MSE training and substantially improve spectral and physical fidelity.

4. Evaluation, Benchmarking, and Predictive Skill

Performance is assessed against both ERA5 reanalysis and the operational Met Office Global Model (GM), using metrics such as Root Mean Squared Error (RMSE) and Anomaly Correlation Coefficient (ACC):

RMSE at atmospheric level $l$ :

$\text{RMSE}_l = \sqrt{\frac{1}{TIJ}\sum_{t}^{T} \sum_{i}^{I} \sum_{j}^{J} w_i \left(x_{tlij} - \hat{x}_{tlij}\right)^2}$

ACC:

$\text{ACC}_l = \frac{1}{T}\sum_{t}^{T}\frac{\sum_{i,j} w_i (x_{tlij} - c_{tlij})(\hat{x}_{tlij} - c_{tlij})}{\sqrt{\sum_{i,j} w_i (x_{tlij} - c_{tlij})^2}\sqrt{\sum_{i,j} w_i (\hat{x}_{tlij} - c_{tlij})^2}}$

where $c_{tlij}$ denotes climatology.

Model comparisons are performed for 1.5° grids over year-2022 holdout data. FastNet O96 (1°) and N320 (0.25°) models consistently achieve lower RMSE and higher ACC on multiple variables (geopotential, temperature, wind components, mean sea level pressure), relative to the Met Office GM. Very short lead time geopotential forecasts occasionally favor GM, but FastNet exhibits pronounced skill benefits especially around 48-hour lead times. The high-resolution N320 model achieves marginal accuracy gains at increased computational cost.

Moreover, FastNet's results are competitive with models such as GraphCast, FourCastNet, Pangu-Weather, and ArchesWeather, especially in skill metrics on WeatherBench2 and related benchmarks.

5. Significance, Operational Implications, and Applications

FastNet’s GNN-based, multi-scale mesh architecture advances the capabilities of MLWP systems in several ways:

Error Control and Multi-scale Dynamics: The architecture’s multi-level mesh and encode–process–decode pipeline enable robust modeling of both synoptic and local phenomena, managing error propagation through residual learning and autoregressive fine-tuning.
Physical Fidelity: Loss functions embedded with domain knowledge (MSH, gradients, alternative wind representation) systematically improve forecast realism, including representation of high-impact events such as extra-tropical cyclones and hurricane-like structures.
Operational Readiness: FastNet surpasses the skill of established physics-based global NWP models using reduced computational resources, suggesting feasibility for frequent updates and real-time deployment in operational centers.

These gains collectively demonstrate the practical utility of combining advanced GNN architectures with physically-informed machine learning objectives.

6. Future Research and Development Directions

The methodologies validated in FastNet v1.0—multi-level graph representations, domain-specific loss engineering, and autoregressive stabilization—lay groundwork for further developments:

Investigation is ongoing into combined loss strategies (e.g., integrating MSH, gradient terms, and wind representation in FastNet v1.1).
Expansion toward hybrid systems that fuse MLWP outputs with traditional ensemble NWP for uncertainty quantification.
Scaling to finer resolution meshes, deploying adaptive refinement, and leveraging advanced message-passing algorithms.
Further exploration of mesh topology effects on artefact generation and mitigation.

A plausible implication is that these lines of inquiry will continue to bridge the gap between purely data-driven approaches and physics-based modeling paradigms, promoting the adoption of MLWP systems for operational and research meteorology.

7. Context within the MLWP Field and Comparative Summary

The introduction of FastNet has been contextualized among other MLWP architectures, notably those relying on neural operators and transformer-based sequence models. FastNet’s principal distinction is its use of a multi-scale graph and explicit message-passing on a mesh tailored to the spherical geometry of Earth, complemented by physically-informed loss design.

Comparison Table: FastNet v1.0 vs. Other MLWP Architectures (Summarized from available results)

Model	Architecture	Key Innovations	Operational Skill Relative to GM
FastNet v1.0	GNN, icosahedral mesh	MSH/gradient loss, wind repr.	Surpasses GM; competitive with SOTA MLWP
GraphCast	Transformer	Sequence-to-sequence, data fusion	Slightly higher skill on some metrics
FourCastNet	Fourier operator	Spectral ConvNets, low-memory	Comparable/competitive
Met Office Global Model (GM)	Physics-based PDE	Spectral/Eulerian	Baseline operational NWP

FastNet’s approach, particularly its mesh design and loss function engineering, is positioned as an exemplar of MLWP systems that explicitly embody both domain-adapted representation and physics-aware training signals (Dunstan et al., 22 Sep 2025, Daub et al., 22 Sep 2025).