Fourier Neural Operators: Theory & Applications

Updated 18 November 2025

Fourier Neural Operators (FNOs) are neural architectures that learn mappings between infinite-dimensional function spaces using spectral convolutions via the Fourier transform.
They enable mesh-invariant surrogate modeling for parametric PDEs and have demonstrated robust performance in quantum dynamics, fluid flows, and other scientific applications.
Despite strong empirical results, FNOs exhibit a spectral bias toward low-frequency features, prompting hybrid approaches and frequency-aware loss functions to mitigate limitations.

A Fourier Neural Operator (FNO) is a neural architecture for learning nonlinear mappings between infinite-dimensional function spaces, where each layer performs a spectral convolution via the Fourier transform, combined with a pointwise nonlinearity and affine transform. FNOs are most widely applied as mesh-invariant surrogates for the solution operators of parametric partial differential equations (PDEs), but have demonstrated strong performance across many operator learning tasks in scientific domains. Recent theoretical and empirical research addresses the mathematical underpinnings, spectral limitations, computational scaling, and extensions of the FNO framework, establishing it as a principal architecture for operator-regression in scientific machine learning.

1. Mathematical Formulation and Layer Architecture

The Fourier Neural Operator is constructed by iterating a sequence of "Fourier layers" between pointwise lift/projection transforms. Formally, for input function $I$ and output $O$ on a domain $D$ , the architecture is

$\begin{aligned} &\text{Lift:} \qquad I\mapsto V_0, \ &\text{Iterate:} \qquad V_{\ell+1}(x) = \sigma \left( W_\ell V_\ell(x) + (K_\ell V_\ell)(x) \right), \quad \ell=0,\ldots,L-1, \ &\text{Project:} \qquad V_L \mapsto O, \end{aligned}$

where:

$W_\ell$ is a learnable channelwise linear map, applied pointwise in $x$ ,
$\sigma$ is a nonlinearity (typically ReLU or GELU; complex-valued activations for quantum tasks),
$K_\ell$ is a "spectral convolution," defined via

$(K_\ell V)(x) = \mathcal{F}^{-1} \big[ \, R_\ell \odot \mathcal{F}(V) \, \big](x),$

with $R_\ell$ a tensor of learned weights in Fourier space, and $\odot$ denoting entrywise multiplication. In practice, only low-wavenumber modes ( $|k|<M$ for cutoff $M$ ) are used, enforcing an inductive bias toward smooth, global structure (Shah et al., 5 Sep 2024, Qin et al., 10 Apr 2024). Each layer thus combines local linear operations and a nonlocal, translation-invariant integral operator implemented as a spectral filter.

2. Theoretical Properties and Universality

FNOs are universal approximators of continuous operators between appropriate function spaces. For $G : H^s(T^d;\mathbb{R}^{d_a}) \to H^{s'}(T^d;\mathbb{R}^{d_u})$ continuous on compact sets, there exists an $L$ -layer FNO $N$ such that $\sup_{a \in K} \|G(a) - N(a)\|_{H^{s'}} \leq \epsilon$ for any $\epsilon > 0$ (Kovachki et al., 2021). This universality relies on (i) spectral approximation in $H^s$ (projection to low modes), (ii) expressivity of the nonlinear σ-layers on local nonlinearities, and (iii) the efficient emulation of classic pseudo-spectral PDE solvers by learning spectral kernels directly from data.

For PDE solution operators with sufficient regularity, the FNO achieves $\epsilon$ -accuracy with parameter count growing sub-linearly or polynomially in $1/\epsilon$ :

For elliptic PDEs (e.g., Darcy flow), $O(\epsilon^{-d/k} \log(1/\epsilon))$ ,
For dissipative Navier–Stokes, $O(\epsilon^{-(1+d/r)} \log(1/\epsilon))$ for $r$ the spatial regularity of the solution (Kovachki et al., 2021).

3. Spectral Bias, Resolution Invariance, and Mitigation

FNOs are subject to a well-characterized spectral bias: they excel at capturing the dominant, low-frequency features of operator targets, while underrepresenting high-frequency (non-dominant) modes (Qin et al., 10 Apr 2024, Kalimuthu et al., 5 Apr 2025). This bias is enforced by:

Explicit mode truncation, typically retaining only the lowest $M$ modes per dimension,
Nonlinear activations coupling recovered frequencies, but still favoring slow, global dynamics.

Mitigation strategies include:

Residual ensemble architectures (e.g., SpecB-FNO), where additional FNOs are trained sequentially to model the residuals dominated by high-frequency content, reducing high- $k$ error by up to 80% on challenging PDEs (Qin et al., 10 Apr 2024).
Hybrid branches for localized features (e.g., LOGLO-FNO), combining patchwise local spectral convolutions and high-frequency propagation modules, as well as frequency-aware loss functions that explicitly penalize spectral error both in mid and high Fourier bins (Kalimuthu et al., 5 Apr 2025).
Inclusion of CNN-based local feature extractors (Conv-FNO/UNet-FNO), concatenating local spatial features with global spectral features, achieving 2–4× error reductions on turbulence and pattern-formation tasks, especially when data are scarce (Liu et al., 22 Mar 2025, Liu-Schiaffini et al., 26 Feb 2024).
Translation-equivariant attention and residual connections, to improve frequency transfer and enhance stability across depth (Zhao et al., 2023, Kim, 29 Jul 2025).

4. Discretization Error, Computational Scaling, and Implementation

The discretization error of FNOs, stemming from replacing continuous convolutions by discrete FFTs, is tightly controlled: for an input of Sobolev regularity $s$ and grid size $N$ , the aliasing error decays as $O(N^{-s})$ at each layer (Lanthaler et al., 3 May 2024). This algebraic rate holds both in the $l^2$ grid norm and the continuum $L^2$ via trigonometric interpolation, so long as smooth activation functions are used and periodic boundary conditions are enforced. Sharp practical guidelines are:

Choose $N$ such that $N^{-s}$ is commensurate with desired test loss,
Prefer smooth activations (GELU, Tanh) and avoid non-periodic features,
Employ adaptive grid refinement (adaptive subsampling) for efficient learning.

Implementation is highly scalable: by explicit domain and parameter decomposition, FNOs have been used for 2.6 billion-grid-size 3D+time problems on 512 GPUs, with near-linear scaling and orders-of-magnitude speedup over conventional solvers (II et al., 2022). Per-layer complexity is $O(d\,N\,\log N)$ for $N$ grid points and $d$ channels, and the parameter count grows only with the number of retained modes and channel width, not the full input resolution.

5. Application Domains and Empirical Performance

FNOs have demonstrated strong empirical performance across a wide range of operator-regression tasks:

Quantum spin dynamics: Complex-valued FNOs can learn the discrete time-evolution operator for random quantum spin chains, either on the full $2^n$ wavefunction or on a reduced set of polynomially many observables (e.g. Pauli string expectation values), achieving fidelities $F\gtrsim0.99$ on within-domain predictions and sustaining $F\gtrsim0.9$ on extrapolation beyond training time, with $>6\times$ speedup over exact unitary evolution (Shah et al., 5 Sep 2024).
Time-periodic quantum systems: FNOs accurately reconstruct effective Floquet Hamiltonians, observable trajectories, and operator growth, benefiting from discretization-invariance (zero-shot super-resolution in time) and polynomial computational scaling in system size (Qi et al., 8 Sep 2025).
Turbulent and self-gravitating fluid flows: FNOs capture multi-scale coupling in 3D and projected hydrodynamic/MHD turbulence, achieving normalized test errors $\delta_\mathrm{avg} \approx 0.17$ –$0.25$ and orders-of-magnitude inference acceleration (Poletti et al., 31 Jul 2025).
Large-scale PDEs: FNOs trained on time-varying multiphase flows scale favorably to 100M+ degrees of freedom with sub-1% effective-modulus error, and runtimes comparable to FFT-based solvers (Nguyen et al., 16 Jul 2025).
Image classification: FNOs, as continuous neural operators, achieve discretization-invariant feature extraction, with accuracy robust to extreme input resolution changes, and that can be algebraically converted to/from CNNs (Kabri et al., 2023).

A representative summary of observed FNO accuracy and speedup is as follows:

Application Domain	Error Metric	FNO Accuracy	Speedup vs. Solver	Reference
Quantum spin evolution	Fidelity, F	$0.992$, extrap. $0.906$	$6.7\times$	(Shah et al., 5 Sep 2024)
Floquet quantum systems	Rel. RMSE	$2.4\times10^{-3}$ – $10^{-2}$	$\sim150\times$ (for $L=8$ )	(Qi et al., 8 Sep 2025)
3D multiphase PDE surrogates	Rel. $L^2$	$0.3$–$0.4$	$10^2$ – $10^3\times$	(II et al., 2022)
Structural dynamics	Energy ratio	$\sim1.00$ (linear regime)	$10^2$ – $10^5\times$	(Haghi et al., 11 Nov 2025)
Image classification (MNIST)	Top-1 accuracy	$88$–$89$%	Comparable to CNN	(Kabri et al., 2023)

6. Limitations, Controversies, and Open Directions

Despite their efficacy, key limitations of FNOs arise from their spectral bias and the fixed global mode truncation. In nonlinear regimes with significant high-frequency content, FNOs can manifest artificial dissipation, spectral aliasing, and broadband error that are irreducible regardless of training dataset size (Haghi et al., 11 Nov 2025, Qin et al., 10 Apr 2024). Specifically:

For strongly nonlinear systems generating harmonics beyond the learned spectral bandwidth, FNOs fail to conserve energy and rapidly degrade in phase coherence.
For complex PDEs with rapid coefficient variation or local singularities, default FNOs struggle; enhancements via local kernels, hierarchical attention, or frequency-aware losses are required (Zhao et al., 2023, Liu et al., 22 Mar 2025, Liu-Schiaffini et al., 26 Feb 2024).

Recent theoretical work has refined the critical initialization and hyperparameter scaling laws for FNOs (Kim, 29 Jul 2025, Li et al., 24 Jun 2025), showing that stability across depth and spectral range is controlled by layerwise variance and spectral cutoff (e.g. per-mode variance $1/\sqrt{d \log K}$ ) and by hyperparameters transferable across model scales without re-tuning via Maximal Update Parametrization.

Ongoing open research directions include:

Dynamically adaptive mode truncation and learnable spectral cutoffs,
Multi-scale and wavelet-based operator layers,
Integrating physics-informed or conservation constraint layers,
Efficient extensions to non-periodic, non-Euclidean, or highly anisotropic domains.

7. Practical Guidelines for Usage and Deployment

Select spectral cutoff $M$ to match the bandwidth of target operator outputs; insufficient $M$ leads to irrecoverable loss of high-frequency content.
For tasks requiring high-frequency or localized feature capture, combine FNOs with local kernels (CNN/patchwise Fourier) and/or translation-equivariant attention.
When training on computational clusters, exploit domain decomposition and FFT-based parallel primitives for scalable FNO implementation (II et al., 2022).
Use spectral or spectrogram-based loss functions to penalize both $L^2$ and frequency-domain errors, especially in settings with potential mode truncation artifacts (Haghi et al., 11 Nov 2025, Kalimuthu et al., 5 Apr 2025).
Employ adaptive resolution sampling and validation loss monitoring to optimize training effort per desired error, especially in operator surrogates for large-scale PDEs (Lanthaler et al., 3 May 2024).

Collectively, Fourier Neural Operators provide a robust, theoretically grounded, and computationally efficient framework for operator learning over function spaces, with strong empirical performance across quantum, fluid, and structural dynamics, as well as image-based and high-dimensional surrogate modeling contexts (Shah et al., 5 Sep 2024, II et al., 2022, Kovachki et al., 2021, Nguyen et al., 16 Jul 2025, Kabri et al., 2023, Qi et al., 8 Sep 2025). The architecture continues to be a primary choice for mesh- and resolution-invariant surrogate models where the target operator retains smooth, global structure, though their limitations in nonlinearity- and high-frequency-dominated systems are increasingly well understood and actively addressed in ongoing research.