Spherical Fourier Neural Operators (SFNOs)

Updated 17 January 2026

Spherical Fourier Neural Operators (SFNOs) are models that extend traditional Fourier Neural Operators to spherical domains by employing spherical harmonic transforms for intrinsic geometric representation.
They utilize a spectral convolution framework with forward and inverse spherical harmonic transforms to achieve efficient, rotation-equivariant learning without grid-induced artifacts.
SFNOs enable scalable, stable ensemble forecasting in global weather models, offering computational efficiency and improved accuracy compared to conventional methods.

Spherical Fourier Neural Operators (SFNOs) are a class of machine learning models that generalize Fourier Neural Operators (FNOs) to the sphere, enabling efficient, equivariant learning of mappings between function spaces defined on spherical domains. By replacing the standard Euclidean Fourier transform with the spherical harmonic transform, SFNOs capture the intrinsic geometry and rotation symmetries of the sphere, avoiding artifacts introduced by flat or latitude–longitude coordinate systems. SFNOs have demonstrated state-of-the-art stability and accuracy for global atmospheric modeling, allowing for fast, high-resolution, and probabilistically calibrated emulation of physically based weather forecasts (Mahesh et al., 2024, Bonev et al., 2023).

1. Mathematical Foundations and Layer Construction

SFNOs operate on scalar or vector fields defined over the sphere, $f(\theta, \varphi) \in L^2(S^2)$ . Such a field can be expanded in terms of complex spherical harmonics $Y_\ell^m(\theta, \varphi)$ as

$f(\theta, \varphi) = \sum_{\ell=0}^{L} \sum_{m=-\ell}^{\ell} \widehat f_{\ell m} Y_\ell^m(\theta, \varphi)$

where the spherical harmonic coefficients are given by

$\widehat f_{\ell m} = \int_{S^2} f(\theta, \varphi) \overline{Y_\ell^m(\theta, \varphi)} \, dS \qquad (dS = \sin\theta \, d\theta \, d\varphi).$

SFNO layers implement learnable, rotation-equivariant spectral convolution in this basis. Each layer consists of the following steps:

Forward Spherical Harmonic Transform (SHT): The input feature map is projected onto the spherical harmonic basis.
Spectral Filtering: For each $(\ell, m)$ (and per channel), spectral coefficients are mixed by a learned weight tensor $W_{\ell m, c', c}$ . The spectral operation per layer is:

$\widehat g_{\ell m, c'} = \sum_{c} W_{\ell m, c', c} \; \widehat f_{\ell m, c}$

where $c$ and $c'$ index input/output channels.

Inverse SHT: The filtered coefficients are mapped back to physical space.
Pointwise Nonlinearity and Channel Mixing: Subsequent feed-forward MLP or activation layer is applied in physical space.

Because spherical convolution diagonalizes in the harmonic domain, filter parameterizations are efficient: the weight tensor can be much smaller than a dense kernel and leverages the orthonormality and symmetry of the spherical basis. There is no need for padding or explicit pole treatment, and the computational cost for the transform is $O(N^2 L_{\max})$ where $N$ is spatial resolution and $L_{\max}$ is the harmonic cutoff (Mahesh et al., 2024, Bonev et al., 2023).

2. Network Architectures and Scaling

SFNO architectures consist of a stack of such harmonic layers, each including lift/projection, spectral mixing, nonlinear channel processing, and residual skip connections. Key architectural features include:

Input lifting: The raw input is projected into a higher-dimensional embedding via a learned linear layer.
Spectral depth and width: Model size can be scaled by adjusting the embedding dimension $d$ $d$ and spectral downsampling factor $s$ $s$ . For example, (Mahesh et al., 2024) reports:
- Small SFNO: $s=6$ , $d=220$ ; $\sim$ 48M parameters
- Medium SFNO: $s=4$ , $d=384$ ; $\sim$ 218M parameters
- Large SFNO: $s=2$ , $d=620$ ; $\sim$ 1.1B parameters
Per-mode filters: No parameter tying across $\ell,m$ , maximizing expressivity but increasing parameter count.
No explicit autoregressive loss: Training is always at the single-step level to preserve high-wavenumber variability.
Model parallelism: Large models leverage model/data parallelism across many GPUs.

This architecture ensures linear scaling with resolution, geometric equivariance, and the ability to support massive parameter counts required for complex atmospheric forecasting tasks.

3. Training Protocols and Datasets

Standard SFNO training proceeds as single-step operator learning from paired states $X(t), X(t+\Delta t)$ , minimizing a (latitude- or pressure-weighted) mean squared error loss across all channels: $L = \sum_{i, j, c} w_{\text{lat}}(i) [X_{\text{pred}}(i, j, c) - X_{\text{true}}(i, j, c)]^2$ where weights $w_{\text{lat}}(i)$ encode the quadrature or latitude compensation.

Key aspects of practical training regimes:

Dataset: ERA5 reanalysis on a $0.25^\circ$ grid, spanning multiple decades for training, with held-out years for validation/testing.
Input variables: 74 channels comprising 73 atmospheric fields (on 13 pressure levels) plus surface/satellite information.
Temporal resolution: Typically $\Delta t = 6$ hours.
Optimization: Adam-style optimizers with data and model parallelism across 256 A100 GPUs.
No explicit pole treatment: Spherical harmonics inherently address coordinate singularities.

No autoregressive rollout loss is employed during training to preserve high-wavenumber content during inference; this setting is chosen to match the requirements of ensemble weather forecasting (Mahesh et al., 2024).

4. Ensemble Forecasting with SFNOs

SFNOs enable practical generation of "huge" ensembles (hundreds to thousands of members), which is infeasible with traditional physics-based simulators due to computational cost. In the SFNO-BVMC (Bred Vector and Multiple Checkpoint) framework:

Model uncertainty: Realized via multiple independently trained SFNO checkpoints (29 in (Mahesh et al., 2024)) with different random seeds.
Initial condition uncertainty: For each checkpoint, bred vectors (the fastest-growing perturbations) are generated via iterative noise injection and model evolution, resulting in two perturbed members per checkpoint (one with the bred vector added, one subtracted), yielding $N_{\text{checkpoints}} \times 2$ ensemble members.

Bred-vector algorithm steps:

Add spatially correlated Gaussian noise to Z500, with 500 km decorrelation.
Run perturbed/control trajectories forward in time.
Accumulate growing-mode perturbations and rescale to a fixed amplitude ( $0.35 \times$ SFNO 48h RMSE).
Apply to all variables for the final ensemble initialization.

This ensemble design ensures:

Each individual member maintains "realistic" spectra (constant in time), avoiding spectral collapse.
The ensemble-mean spectrum smears with lead time, consistent with dynamical ensemble spread behavior in operational models like IFS.
Large ensemble sizes become feasible due to orders-of-magnitude lower inference cost of SFNOs compared to full-physics NWP models (Mahesh et al., 2024).

5. Evaluation Metrics and Diagnostic Tools

SFNO and SFNO-BVMC ensemble performance is assessed via a spectrum of diagnostics:

Mean Forecast and Ensemble Metrics:
- Continuous Ranked Probability Score (CRPS):
  
  $\mathrm{CRPS}(F, y) = \int (F(z) - 1_{\{y \leq z\}})^2 dz = \mathbb{E}|X-y| - \frac{1}{2}\mathbb{E}|X-X'|$
- Spread–error ratio: $\sqrt{\mathrm{global\;mean\;variance}/\mathrm{global\;mean\;MSE}}$ (ideal $\approx$ 1).
- Ensemble mean RMSE, climatology cross-over at long leads.
Spectral Diagnostics:
- Individual member spectra $S_n(k)$ stability over hundreds of time steps.
- Ensemble-mean spectrum $\bar{S}(k)$ decay at synoptic scales, reproducing known behavior of real atmospheric flows.
Extreme Value Forecasting:
- Extreme Forecast Index (EFI) by lead time and location:
  
  $\text{EFI} = \frac{2}{\pi} \int_0^1 \frac{Q - Q f(Q)}{Q(1-Q)} dQ$
with $f(Q)$ as the fraction of members below the $Q$ th climatological percentile. - Reliability diagrams and ROC-AUC for extreme (e.g., 95th percentile) events. - Threshold-weighted CRPS (twCRPS) for tail events.

$\mathrm{twCRPS}(F, y; t) = \int (F(z) - 1_{\{y \leq z\}})^2 w(z) dz \quad w(z) = 1 \text{ if } z > t$

Empirically, SFNO-BVMC matches the spectral and ensemble behavior of the ECMWF IFS benchmark, with spatial correlation $\approx 0.9$ in EFI, similar or improved twCRPS for extremes, but with a $\sim$ 18 hour lag in CRPS compared to the full IFS ensemble (Mahesh et al., 2024).

6. Advantages, Limitations, and Extensions

Advantages

Geometric fidelity: SFNOs are equivariant to sphere rotations (SO(3)), and free from grid-aligned or polar artifacts.
Resolution scalability: Linear scaling in spatial complexity with respect to resolution; no superlinear FFT cost.
Stability: Empirical stability for thousands of autoregressive steps; physically plausible and spectrum-preserving long-range forecasts (Bonev et al., 2023).
Efficiency: Large ensembles (≫1000 members) and long-term rollouts are computationally feasible, with inference costs many orders of magnitude below operational NWP.
Ensemble construction: Straightforward integration of model and initial condition uncertainty.

Limitations

Single-step training: Yields some blurring at small scales; spectral sharpness is not fully preserved without multi-step or stochastic losses.
Underdispersion: For short lead times, ensemble spread can be too narrow relative to forecast error.
Benchmark lag: SFNO-BVMC lags IFS by ≈18 h in CRPS.
Precipitation: Explicit rainfall prediction and other strongly non-Gaussian variables remain open problems.
Memory/computation: The largest models (e.g., 1.1B parameters) require model-parallel, multi-GPU infrastructure; no sparsity-engineering is used (Mahesh et al., 2024).

Recent work on Green’s-function Spherical Neural Operators (GSNO) generalizes the SFNO architecture by introducing a learnable correction term in the spectral domain, balancing strict equivariance with the ability to encode absolute position effects (e.g., boundaries, orography). This yields improved accuracy (5–10%) with negligible additional computational cost, and a principled PDE operator-theoretic interpretation (Tang et al., 11 Dec 2025). GSNOs recover SFNOs as a strict SO(3)-equivariant special case when the correction vanishes.

7. Applications and Impact in Scientific Forecasting

SFNOs have found application in accelerating global weather and climate simulation, particularly in the context of ensemble forecasting for extreme and rare event characterization. Demonstrated on ERA5 and shallow-water equations data, SFNOs provide state-of-the-art skill on standard physical metrics (e.g., anomaly correlation, spectrum fidelity), stable year-long rollouts, and practical utility for massive ensemble generation in the study of internal climate variability and extremes (Mahesh et al., 2024, Bonev et al., 2023).

A plausible implication is that SFNOs and their generalizations will serve as foundational architectures for efficient, high-resolution spherical operator learning, enabling both theoretical advances and tangible progress in machine-learning-based scientific modeling.

Markdown Report Issue Upgrade to Chat

References (3)

Huge Ensembles Part I: Design of Ensemble Weather Forecasts using Spherical Fourier Neural Operators (2024)

Spherical Fourier Neural Operators: Learning Stable Dynamics on the Sphere (2023)

Generalized Spherical Neural Operators: Green's Function Formulation (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Spherical Fourier Neural Operators (SFNOs).