FourCastNet 3: Scalable Spherical Forecasting

Updated 20 July 2025

FourCastNet 3 is a scalable spherical ML model that leverages tailored convolutional architectures and spherical harmonic transforms to respect Earth’s geometry.
It introduces a novel probabilistic ensemble formulation using spatially correlated latent variables and CRPS-based training for calibrated, physically consistent forecasts.
The design achieves rapid inference with efficient multi-parallel training, matching or surpassing state-of-the-art diffusion and physics-based forecasting systems.

FourCastNet 3 is a scalable, geometric machine learning model for global atmospheric ensemble forecasting that combines high-resolution accuracy, computational efficiency, and rigorous probabilistic calibration. Building on earlier FourCastNet designs, FourCastNet 3 introduces a purely convolutional neural architecture tailored for the spherical geometry of Earth, a novel ensemble generation mechanism with spatially correlated latent variables, and a multi-parallelism training strategy enabling efficient large-scale deployment. This framework matches or surpasses both state-of-the-art diffusion models and leading physics-based ensemble systems in accuracy and speed, while maintaining realistic spectral characteristics and long-range temporal stability (Bonev et al., 16 Jul 2025).

1. Spherical Geometric Foundations

FourCastNet 3 is fundamentally constructed to operate natively on spherical data, replacing conventional planar convolutions with methods that respect Earth's curvature and rotational symmetry. Global convolutions are parametrized in the spectral domain via the spherical harmonic transform (SHT), with filtering expressed as:

$f * g = \mathrm{SHT}^{-1}\big(\mathrm{SHT}(f) \cdot \mathrm{SHT}(g)\big)$

This operation is the spherical geometry analogue of the standard convolution theorem. The architecture further includes local spherical convolutions using anisotropic, learnable kernels (for instance, on-disk Morlet wavelets), permitting the network to model both mesoscale and global atmospheric phenomena while avoiding projection artifacts or coordinate discontinuities. By design, FourCastNet 3 is equivariant to rotations and properly preserves the physical structure of signals regardless of geographic location. Such built-in geometric priors are critical for the stability and generalizability of global weather forecasts at extended lead times (Bonev et al., 16 Jul 2025).

2. Probabilistic Ensemble Formulation

FourCastNet 3 implements a probabilistic, hidden Markov model architecture to support ensemble-based forecasting. At each coarse forecast step (e.g., 6 hours), the model evolves the system state $u_n$ by

$u_{n+1} = F_\theta(u_n, t_n, z_n)$

where $z_n$ is a realization of a stochastic latent variable sampled from a spherical, multi-scale diffusion process. Perturbing $z_n$ generates physically consistent, spatially correlated ensemble members, crucial for capturing the uncertainties inherent in atmospheric evolution.

Training is supervised using both:

A spatially localized continuous ranked probability score (CRPS) loss, ensuring well-calibrated local ensemble statistics;
A spectral loss that preserves the observed distribution of kinetic energy and other moments across spatial scales.

Such losses guarantee that ensemble spread matches forecast error and that the simulated spectra remain realistic and stable even on subseasonal (up to 60-day) timescales (Bonev et al., 16 Jul 2025).

3. Purely Convolutional Spherical Architecture

The network adopts a classic encoder–processor–decoder topology, with significant innovations reflecting Earth’s geometry and atmospheric variables:

Encoder: Downsamples high-resolution input fields (e.g., $721 \times 1440$ grid) to an appropriate spherical grid such as Gaussian or HEALPix while preserving channels. This avoids variable mixing and spatial aliasing.
Processor: Chains multiple blocks that implement both:
- Local spherical convolutions (using anisotropic Morlet wavelets to capture features at a range of orientations and scales);
- Global spectral-domain convolutions (using SHT-based filtering for planetary-scale modes).
Decoder: Upsamples back to high spatial resolution, counteracting aliasing and ensuring that ensemble forecasts can be evaluated at the required granularity.

This convolutional setup is entirely free of planar or Euclidean assumptions, and each step is engineered for the unique requirements of spherical meteorological modeling (Bonev et al., 16 Jul 2025).

4. Scalable Parallel Training Paradigm

FourCastNet 3 achieves high scalability and efficiency via a hybrid parallelism strategy, inspired by domain decomposition in classical numerical weather prediction:

Model Parallelism: Divides the global domain spatially across multiple GPUs so all model (parameter and state) memory can be efficiently distributed. For extremely large models, grids are split into 4 to 16 “ranks” each handling a segment of the sphere.
Data Parallelism: Applies ensemble and batch parallelism, permitting thousands of different forecast scenarios to be simulated in parallel.
Training Curriculum: Proceeds in phases—initial step-wise training at 6-hour leads, multi-step autoregressive rollout refinement, and final fine-tuning on recent data—with careful parallel communication to maximize utilization at scale.

Implementation uses frameworks such as NVIDIA’s Makani to seamlessly orchestrate training on clusters exceeding 1024 GPUs. This allows the model to be trained and deployed for operational Earth system forecasting with limited turnaround time (Bonev et al., 16 Jul 2025).

5. Probabilistic Calibration and Spectral Fidelity

A hallmark of FourCastNet 3 is its skill in simultaneous probabilistic calibration and physical fidelity. Model ensembles are shown to exhibit spread–skill ratios near unity and to produce nearly flat rank histograms. Forecast spectra, even out to 60-day lead times, retain realistic energy distributions across scales—addressing a key failing of many prior ML approaches where high-wavenumber energy either decays (overly diffusive) or grows unrealistically (“spectral blowup”).

Performance measures including root-mean-square error (RMSE), CRPS, and ensemble reliability outperform or match those of conventional state-of-the-art systems (e.g., IFS-ENS) and diffusion-based ML models. Notably, a single 15-day global forecast at 0.25° spatial, 6-hourly temporal resolution is evaluated in 64 seconds on one NVIDIA H100 (versus minutes or hours for alternatives), with a 90-day forecast available in under 20 seconds (Bonev et al., 16 Jul 2025).

6. Applications, Impact, and Future Directions

The capabilities of FourCastNet 3 suggest new possibilities for operational meteorology and climate research:

Massive Ensemble Generation: Enables rapid large-ensemble production for early warning and probabilistic risk assessments.
Subseasonal to Medium-range Applications: Demonstrated stability and skill at extended leads facilitates subseasonal forecasting and scenario analysis.
Spectral and Physical Consistency: The spectrum-preserving design ensures reliable depiction of both large-scale planetary waves and regional events such as tropical cyclones and storms.
Operational Deployment: The inference speed, calibration, and efficient scaling open pathways for real-time forecasting on moderate hardware (even a single GPU), as well as deployment in research institutions with limited computational resources.

Because the architecture and training mechanisms are fully open-sourced, FourCastNet 3 is positioned as a new, accessible platform for the community, creating opportunities for hybridization with traditional NWP, research on data assimilation, and foundation-model–style climate system emulation (Bonev et al., 16 Jul 2025).

7. Summary Table: Key Features

Feature	FourCastNet 3	Prior ML/NWP Models
Geometry	Spherical, spectral & local convolution	Planar or approximate spherical
Ensemble Strategy	Probabilistic, spatially correlated latents	Deterministic/perturbed ICs
Calibration/Spectra	Yes, stable spectra & rank histograms	Commonly uncalibrated/spectral drift
Inference speed (90d global)	< 20 seconds (1 GPU)	8–60× slower (diffusion/NWP)
Scalability	Model & data parallel; 1024+ GPUs for training	Generally batch-only, less scalable

FourCastNet 3 therefore represents a substantial advancement in data-driven weather prediction, integrating geometric learning, probabilistic inference, scalability, and operational practicality in a single, open framework (Bonev et al., 16 Jul 2025).

PDF Markdown Chat (Pro)

References (1)

FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to FourCastNet 3.