Papers
Topics
Authors
Recent
Search
2000 character limit reached

Fourier Neural Operators: Theory & Applications

Updated 18 November 2025
  • Fourier Neural Operators (FNOs) are neural architectures that learn mappings between infinite-dimensional function spaces using spectral convolutions via the Fourier transform.
  • They enable mesh-invariant surrogate modeling for parametric PDEs and have demonstrated robust performance in quantum dynamics, fluid flows, and other scientific applications.
  • Despite strong empirical results, FNOs exhibit a spectral bias toward low-frequency features, prompting hybrid approaches and frequency-aware loss functions to mitigate limitations.

A Fourier Neural Operator (FNO) is a neural architecture for learning nonlinear mappings between infinite-dimensional function spaces, where each layer performs a spectral convolution via the Fourier transform, combined with a pointwise nonlinearity and affine transform. FNOs are most widely applied as mesh-invariant surrogates for the solution operators of parametric partial differential equations (PDEs), but have demonstrated strong performance across many operator learning tasks in scientific domains. Recent theoretical and empirical research addresses the mathematical underpinnings, spectral limitations, computational scaling, and extensions of the FNO framework, establishing it as a principal architecture for operator-regression in scientific machine learning.

1. Mathematical Formulation and Layer Architecture

The Fourier Neural Operator is constructed by iterating a sequence of "Fourier layers" between pointwise lift/projection transforms. Formally, for input function II and output OO on a domain DD, the architecture is

Lift:I↦V0, Iterate:Vℓ+1(x)=σ(WℓVℓ(x)+(KℓVℓ)(x)),ℓ=0,…,L−1, Project:VL↦O,\begin{aligned} &\text{Lift:} \qquad I\mapsto V_0, \ &\text{Iterate:} \qquad V_{\ell+1}(x) = \sigma \left( W_\ell V_\ell(x) + (K_\ell V_\ell)(x) \right), \quad \ell=0,\ldots,L-1, \ &\text{Project:} \qquad V_L \mapsto O, \end{aligned}

where:

  • Wâ„“W_\ell is a learnable channelwise linear map, applied pointwise in xx,
  • σ\sigma is a nonlinearity (typically ReLU or GELU; complex-valued activations for quantum tasks),
  • Kâ„“K_\ell is a "spectral convolution," defined via

(KℓV)(x)=F−1[ Rℓ⊙F(V) ](x),(K_\ell V)(x) = \mathcal{F}^{-1} \big[ \, R_\ell \odot \mathcal{F}(V) \, \big](x),

with Râ„“R_\ell a tensor of learned weights in Fourier space, and OO0 denoting entrywise multiplication. In practice, only low-wavenumber modes (OO1 for cutoff OO2) are used, enforcing an inductive bias toward smooth, global structure (Shah et al., 2024, Qin et al., 2024). Each layer thus combines local linear operations and a nonlocal, translation-invariant integral operator implemented as a spectral filter.

2. Theoretical Properties and Universality

FNOs are universal approximators of continuous operators between appropriate function spaces. For OO3 continuous on compact sets, there exists an OO4-layer FNO OO5 such that OO6 for any OO7 (Kovachki et al., 2021). This universality relies on (i) spectral approximation in OO8 (projection to low modes), (ii) expressivity of the nonlinear σ-layers on local nonlinearities, and (iii) the efficient emulation of classic pseudo-spectral PDE solvers by learning spectral kernels directly from data.

For PDE solution operators with sufficient regularity, the FNO achieves OO9-accuracy with parameter count growing sub-linearly or polynomially in DD0:

  • For elliptic PDEs (e.g., Darcy flow), DD1,
  • For dissipative Navier–Stokes, DD2 for DD3 the spatial regularity of the solution (Kovachki et al., 2021).

3. Spectral Bias, Resolution Invariance, and Mitigation

FNOs are subject to a well-characterized spectral bias: they excel at capturing the dominant, low-frequency features of operator targets, while underrepresenting high-frequency (non-dominant) modes (Qin et al., 2024, Kalimuthu et al., 5 Apr 2025). This bias is enforced by:

  • Explicit mode truncation, typically retaining only the lowest DD4 modes per dimension,
  • Nonlinear activations coupling recovered frequencies, but still favoring slow, global dynamics.

Mitigation strategies include:

  • Residual ensemble architectures (e.g., SpecB-FNO), where additional FNOs are trained sequentially to model the residuals dominated by high-frequency content, reducing high-DD5 error by up to 80% on challenging PDEs (Qin et al., 2024).
  • Hybrid branches for localized features (e.g., LOGLO-FNO), combining patchwise local spectral convolutions and high-frequency propagation modules, as well as frequency-aware loss functions that explicitly penalize spectral error both in mid and high Fourier bins (Kalimuthu et al., 5 Apr 2025).
  • Inclusion of CNN-based local feature extractors (Conv-FNO/UNet-FNO), concatenating local spatial features with global spectral features, achieving 2–4× error reductions on turbulence and pattern-formation tasks, especially when data are scarce (Liu et al., 22 Mar 2025, Liu-Schiaffini et al., 2024).
  • Translation-equivariant attention and residual connections, to improve frequency transfer and enhance stability across depth (Zhao et al., 2023, Kim, 29 Jul 2025).

4. Discretization Error, Computational Scaling, and Implementation

The discretization error of FNOs, stemming from replacing continuous convolutions by discrete FFTs, is tightly controlled: for an input of Sobolev regularity DD6 and grid size DD7, the aliasing error decays as DD8 at each layer (Lanthaler et al., 2024). This algebraic rate holds both in the DD9 grid norm and the continuum Lift:I↦V0, Iterate:Vℓ+1(x)=σ(WℓVℓ(x)+(KℓVℓ)(x)),ℓ=0,…,L−1, Project:VL↦O,\begin{aligned} &\text{Lift:} \qquad I\mapsto V_0, \ &\text{Iterate:} \qquad V_{\ell+1}(x) = \sigma \left( W_\ell V_\ell(x) + (K_\ell V_\ell)(x) \right), \quad \ell=0,\ldots,L-1, \ &\text{Project:} \qquad V_L \mapsto O, \end{aligned}0 via trigonometric interpolation, so long as smooth activation functions are used and periodic boundary conditions are enforced. Sharp practical guidelines are:

  • Choose Lift:I↦V0, Iterate:Vâ„“+1(x)=σ(Wâ„“Vâ„“(x)+(Kâ„“Vâ„“)(x)),â„“=0,…,L−1, Project:VL↦O,\begin{aligned} &\text{Lift:} \qquad I\mapsto V_0, \ &\text{Iterate:} \qquad V_{\ell+1}(x) = \sigma \left( W_\ell V_\ell(x) + (K_\ell V_\ell)(x) \right), \quad \ell=0,\ldots,L-1, \ &\text{Project:} \qquad V_L \mapsto O, \end{aligned}1 such that Lift:I↦V0, Iterate:Vâ„“+1(x)=σ(Wâ„“Vâ„“(x)+(Kâ„“Vâ„“)(x)),â„“=0,…,L−1, Project:VL↦O,\begin{aligned} &\text{Lift:} \qquad I\mapsto V_0, \ &\text{Iterate:} \qquad V_{\ell+1}(x) = \sigma \left( W_\ell V_\ell(x) + (K_\ell V_\ell)(x) \right), \quad \ell=0,\ldots,L-1, \ &\text{Project:} \qquad V_L \mapsto O, \end{aligned}2 is commensurate with desired test loss,
  • Prefer smooth activations (GELU, Tanh) and avoid non-periodic features,
  • Employ adaptive grid refinement (adaptive subsampling) for efficient learning.

Implementation is highly scalable: by explicit domain and parameter decomposition, FNOs have been used for 2.6 billion-grid-size 3D+time problems on 512 GPUs, with near-linear scaling and orders-of-magnitude speedup over conventional solvers (II et al., 2022). Per-layer complexity is Lift:I↦V0, Iterate:Vℓ+1(x)=σ(WℓVℓ(x)+(KℓVℓ)(x)),ℓ=0,…,L−1, Project:VL↦O,\begin{aligned} &\text{Lift:} \qquad I\mapsto V_0, \ &\text{Iterate:} \qquad V_{\ell+1}(x) = \sigma \left( W_\ell V_\ell(x) + (K_\ell V_\ell)(x) \right), \quad \ell=0,\ldots,L-1, \ &\text{Project:} \qquad V_L \mapsto O, \end{aligned}3 for Lift:I↦V0, Iterate:Vℓ+1(x)=σ(WℓVℓ(x)+(KℓVℓ)(x)),ℓ=0,…,L−1, Project:VL↦O,\begin{aligned} &\text{Lift:} \qquad I\mapsto V_0, \ &\text{Iterate:} \qquad V_{\ell+1}(x) = \sigma \left( W_\ell V_\ell(x) + (K_\ell V_\ell)(x) \right), \quad \ell=0,\ldots,L-1, \ &\text{Project:} \qquad V_L \mapsto O, \end{aligned}4 grid points and Lift:I↦V0, Iterate:Vℓ+1(x)=σ(WℓVℓ(x)+(KℓVℓ)(x)),ℓ=0,…,L−1, Project:VL↦O,\begin{aligned} &\text{Lift:} \qquad I\mapsto V_0, \ &\text{Iterate:} \qquad V_{\ell+1}(x) = \sigma \left( W_\ell V_\ell(x) + (K_\ell V_\ell)(x) \right), \quad \ell=0,\ldots,L-1, \ &\text{Project:} \qquad V_L \mapsto O, \end{aligned}5 channels, and the parameter count grows only with the number of retained modes and channel width, not the full input resolution.

5. Application Domains and Empirical Performance

FNOs have demonstrated strong empirical performance across a wide range of operator-regression tasks:

  • Quantum spin dynamics: Complex-valued FNOs can learn the discrete time-evolution operator for random quantum spin chains, either on the full Lift:I↦V0, Iterate:Vâ„“+1(x)=σ(Wâ„“Vâ„“(x)+(Kâ„“Vâ„“)(x)),â„“=0,…,L−1, Project:VL↦O,\begin{aligned} &\text{Lift:} \qquad I\mapsto V_0, \ &\text{Iterate:} \qquad V_{\ell+1}(x) = \sigma \left( W_\ell V_\ell(x) + (K_\ell V_\ell)(x) \right), \quad \ell=0,\ldots,L-1, \ &\text{Project:} \qquad V_L \mapsto O, \end{aligned}6 wavefunction or on a reduced set of polynomially many observables (e.g. Pauli string expectation values), achieving fidelities Lift:I↦V0, Iterate:Vâ„“+1(x)=σ(Wâ„“Vâ„“(x)+(Kâ„“Vâ„“)(x)),â„“=0,…,L−1, Project:VL↦O,\begin{aligned} &\text{Lift:} \qquad I\mapsto V_0, \ &\text{Iterate:} \qquad V_{\ell+1}(x) = \sigma \left( W_\ell V_\ell(x) + (K_\ell V_\ell)(x) \right), \quad \ell=0,\ldots,L-1, \ &\text{Project:} \qquad V_L \mapsto O, \end{aligned}7 on within-domain predictions and sustaining Lift:I↦V0, Iterate:Vâ„“+1(x)=σ(Wâ„“Vâ„“(x)+(Kâ„“Vâ„“)(x)),â„“=0,…,L−1, Project:VL↦O,\begin{aligned} &\text{Lift:} \qquad I\mapsto V_0, \ &\text{Iterate:} \qquad V_{\ell+1}(x) = \sigma \left( W_\ell V_\ell(x) + (K_\ell V_\ell)(x) \right), \quad \ell=0,\ldots,L-1, \ &\text{Project:} \qquad V_L \mapsto O, \end{aligned}8 on extrapolation beyond training time, with Lift:I↦V0, Iterate:Vâ„“+1(x)=σ(Wâ„“Vâ„“(x)+(Kâ„“Vâ„“)(x)),â„“=0,…,L−1, Project:VL↦O,\begin{aligned} &\text{Lift:} \qquad I\mapsto V_0, \ &\text{Iterate:} \qquad V_{\ell+1}(x) = \sigma \left( W_\ell V_\ell(x) + (K_\ell V_\ell)(x) \right), \quad \ell=0,\ldots,L-1, \ &\text{Project:} \qquad V_L \mapsto O, \end{aligned}9 speedup over exact unitary evolution (Shah et al., 2024).
  • Time-periodic quantum systems: FNOs accurately reconstruct effective Floquet Hamiltonians, observable trajectories, and operator growth, benefiting from discretization-invariance (zero-shot super-resolution in time) and polynomial computational scaling in system size (Qi et al., 8 Sep 2025).
  • Turbulent and self-gravitating fluid flows: FNOs capture multi-scale coupling in 3D and projected hydrodynamic/MHD turbulence, achieving normalized test errors Wâ„“W_\ell0–Wâ„“W_\ell1 and orders-of-magnitude inference acceleration (Poletti et al., 31 Jul 2025).
  • Large-scale PDEs: FNOs trained on time-varying multiphase flows scale favorably to 100M+ degrees of freedom with sub-1% effective-modulus error, and runtimes comparable to FFT-based solvers (Nguyen et al., 16 Jul 2025).
  • Image classification: FNOs, as continuous neural operators, achieve discretization-invariant feature extraction, with accuracy robust to extreme input resolution changes, and that can be algebraically converted to/from CNNs (Kabri et al., 2023).

A representative summary of observed FNO accuracy and speedup is as follows:

Application Domain Error Metric FNO Accuracy Speedup vs. Solver Reference
Quantum spin evolution Fidelity, F Wâ„“W_\ell2, extrap. Wâ„“W_\ell3 Wâ„“W_\ell4 (Shah et al., 2024)
Floquet quantum systems Rel. RMSE WℓW_\ell5–WℓW_\ell6 WℓW_\ell7 (for WℓW_\ell8) (Qi et al., 8 Sep 2025)
3D multiphase PDE surrogates Rel. WℓW_\ell9 xx0–xx1 xx2–xx3 (II et al., 2022)
Structural dynamics Energy ratio xx4 (linear regime) xx5–xx6 (Haghi et al., 11 Nov 2025)
Image classification (MNIST) Top-1 accuracy xx7–xx8% Comparable to CNN (Kabri et al., 2023)

6. Limitations, Controversies, and Open Directions

Despite their efficacy, key limitations of FNOs arise from their spectral bias and the fixed global mode truncation. In nonlinear regimes with significant high-frequency content, FNOs can manifest artificial dissipation, spectral aliasing, and broadband error that are irreducible regardless of training dataset size (Haghi et al., 11 Nov 2025, Qin et al., 2024). Specifically:

  • For strongly nonlinear systems generating harmonics beyond the learned spectral bandwidth, FNOs fail to conserve energy and rapidly degrade in phase coherence.
  • For complex PDEs with rapid coefficient variation or local singularities, default FNOs struggle; enhancements via local kernels, hierarchical attention, or frequency-aware losses are required (Zhao et al., 2023, Liu et al., 22 Mar 2025, Liu-Schiaffini et al., 2024).

Recent theoretical work has refined the critical initialization and hyperparameter scaling laws for FNOs (Kim, 29 Jul 2025, Li et al., 24 Jun 2025), showing that stability across depth and spectral range is controlled by layerwise variance and spectral cutoff (e.g. per-mode variance xx9) and by hyperparameters transferable across model scales without re-tuning via Maximal Update Parametrization.

Ongoing open research directions include:

  • Dynamically adaptive mode truncation and learnable spectral cutoffs,
  • Multi-scale and wavelet-based operator layers,
  • Integrating physics-informed or conservation constraint layers,
  • Efficient extensions to non-periodic, non-Euclidean, or highly anisotropic domains.

7. Practical Guidelines for Usage and Deployment

  • Select spectral cutoff σ\sigma0 to match the bandwidth of target operator outputs; insufficient σ\sigma1 leads to irrecoverable loss of high-frequency content.
  • For tasks requiring high-frequency or localized feature capture, combine FNOs with local kernels (CNN/patchwise Fourier) and/or translation-equivariant attention.
  • When training on computational clusters, exploit domain decomposition and FFT-based parallel primitives for scalable FNO implementation (II et al., 2022).
  • Use spectral or spectrogram-based loss functions to penalize both σ\sigma2 and frequency-domain errors, especially in settings with potential mode truncation artifacts (Haghi et al., 11 Nov 2025, Kalimuthu et al., 5 Apr 2025).
  • Employ adaptive resolution sampling and validation loss monitoring to optimize training effort per desired error, especially in operator surrogates for large-scale PDEs (Lanthaler et al., 2024).

Collectively, Fourier Neural Operators provide a robust, theoretically grounded, and computationally efficient framework for operator learning over function spaces, with strong empirical performance across quantum, fluid, and structural dynamics, as well as image-based and high-dimensional surrogate modeling contexts (Shah et al., 2024, II et al., 2022, Kovachki et al., 2021, Nguyen et al., 16 Jul 2025, Kabri et al., 2023, Qi et al., 8 Sep 2025). The architecture continues to be a primary choice for mesh- and resolution-invariant surrogate models where the target operator retains smooth, global structure, though their limitations in nonlinearity- and high-frequency-dominated systems are increasingly well understood and actively addressed in ongoing research.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Fourier Neural Operators (FNOs).