Papers
Topics
Authors
Recent
2000 character limit reached

Fourier Neural Operators (FNOs)

Updated 17 December 2025
  • Fourier Neural Operators (FNOs) are neural architectures that learn mappings between function spaces, enabling resolution-independent approximation of PDE solutions.
  • They employ Fourier layers with spectral truncation to filter high-frequency noise, achieving significant computational speed-ups over traditional numerical methods.
  • FNOs scale to high-dimensional problems, support hybrid extensions for capturing fine-scale features, and offer theoretical guarantees on generalization and discretization errors.

Fourier Neural Operators (FNOs) are neural architectures designed for learning mappings between function spaces, most notably solution operators of partial differential equations (PDEs). Introduced as a mesh-invariant, spectral-domain parameterization, FNOs can rapidly approximate entire families of PDE solutions and are resolution-independent, delivering orders-of-magnitude speedup compared to classical numerical methods and outperforming preceding operator-learning neural architectures (Li et al., 2020, II et al., 2022, Duruisseaux et al., 1 Dec 2025).

1. Mathematical Formulation and Operator Learning

The central aim of the FNO framework is to learn a mapping (operator) Gθ:AU\mathcal{G}_\theta: \mathcal{A} \rightarrow \mathcal{U}, where A,UL2(Ω)\mathcal{A}, \mathcal{U} \subset L^2(\Omega) are spaces of input and output functions (e.g., PDE coefficients to solution functions) (Li et al., 2020, II et al., 2022). The FNO is characterized by:

  • Lifting: The input function a(x)a(x) is mapped pointwise to a higher-dimensional feature space via an affine transformation.
  • Fourier Layers:

ut+1(x)=σ[Wut(x)+F1(R(k)F[ut](k))(x)]u_{t+1}(x) = \sigma \left[ W u_t(x) + \mathcal{F}^{-1} \left( R(k) \cdot \mathcal{F}[u_t](k) \right)(x) \right]

where WW is a trainable channel-mixing matrix, F,F1\mathcal{F}, \mathcal{F}^{-1} are the (inverse) discrete Fourier transforms, R(k)R(k) are trainable spectral filters, and σ\sigma is a pointwise nonlinearity (e.g., ReLU or GeLU). The spectral filter is typically truncated, i.e., R(k)=0R(k) = 0 for k|k| above a cutoff.

  • Projection: After KK layers, a final pointwise linear map produces the output function.
  • Loss: Training minimizes the relative L2L^2 error:

L=upredutrue2utrue2L = \frac{\|u_{\text{pred}} - u_{\text{true}}\|_2}{\|u_{\text{true}}\|_2}

Crucially, discretization invariance is achieved: once trained on a fixed grid, the operator can be evaluated at any resolution, enabling "zero-shot" super-resolution (Li et al., 2020).

2. Spectral Truncation, Regularization, and Discretization Error

FNOs employ low-rank (mode) truncation to restrict the trainable Fourier filters to the lowest mm modes per dimension, enacting a form of smoothness regularization which filters out high-frequency noise and acts as implicit regularization. The choice of mm controls the bias-variance trade-off: too small impairs expressivity, too large increases overfitting and cost (II et al., 2022, Duruisseaux et al., 1 Dec 2025). Each Fourier layer thus costs O(mdNlogN)O(m^d N \log N) as opposed to O(NdlogN)O(N^d \log N) for all modes.

Recent work has quantified the aliasing and discretization errors induced by grid evaluations and mode truncation (Lanthaler et al., 3 May 2024). If the input has Sobolev regularity ss and activation functions are CsC^s-smooth, the L2L^2 grid error decays as O(Ns)O(N^{-s}), provided activation and positional encoding preserve smoothness. Discretization error thus becomes negligible compared to model approximation error when NN is set such that NsεN^{-s} \lesssim \varepsilon, for target error ε\varepsilon.

3. Model-Parallel Scalability and Algorithmic Structure

To scale FNOs to billions of variables, model-parallel FNOs distribute the spatial grid and network tensors over many workers/GPUs (II et al., 2022). The distributed Fourier transforms (DFFT) are implemented via all-to-all communication patterns that realign data layout for local FFTs along each axis, followed by local application of the trainable filters and inverse transforms. The model parameters corresponding to pointwise operations are redundantly stored or broadcast; all communication is compatible with backpropagation and automatic differentiation frameworks.

On systems such as NERSC Perlmutter (512 A100 GPUs), strong and weak scaling with up to $2.6$ billion variables achieved over 80%80\% parallel efficiency. Inference times are reduced from hundreds or thousands of seconds (for standard CPU-based PDE solvers) to \sim1 second, yielding 100×100\times1386×1386\times speed-ups for real-world 4D multiphase CO2_2 subsurface simulations.

4. Generalization, Capacity, and Learning Theory

The generalization properties of FNOs are governed by the architecture (mode count, depth, width) and by the Lp,qL_{p,q} group norms of its parameter tensors, which control the Rademacher complexity and, thus, generalization error bounds (Kim et al., 2022).

For an FNO hypothesis class parameterized so that Lp,qL_{p,q} capacities of all layers are bounded, the empirical excess risk is tightly controlled. Architectural hyperparameters impact the bound via:

  • Mode cutoff kmaxk_{\max}: Higher increases expressivity but raises capacity.
  • Depth DD: Amplifies the effect through a power law, kmaxD/pk_{\max}^{D/p^*}.
  • Group-norm balancing: Small (p,q)12(p, q) \approx 1-2 emphasize weight scale; large (p,q)=(p, q)=\infty capture architecture/mode count sensitivity.

Empirical studies demonstrate high correlation (0.9\geq 0.9) between generalization gap and group capacity for suitable p,qp,q. A key practical implication is to scale kmaxk_{\max} with sample size as kmaxD/pm1/2k_{\max}^{D/p^*} \lesssim m^{1/2} to control overfit.

5. Applications and Limitations

FNOs have been established as state-of-the-art surrogates in large-scale parametric PDE systems, covering:

  • Fluid dynamics (Burgers', Navier–Stokes, Reynolds–averaged, turbulence).
  • Subsurface multiphase porous flow (CO2_2 plumes, Sleipner benchmark).
  • Quantum time-evolution (Floquet, operator growth) (Qi et al., 8 Sep 2025).
  • Micromechanics/homogenization, where FNOs mimic FFT-based solvers for periodic cell problems, achieving universal operator approximation subject only to material-contrast constraints, with rigorous error bounds and grid independence (Nguyen et al., 16 Jul 2025).

In these settings, FNOs yield typical relative L2L^2 errors of $1$–2%2\% across mesh resolutions, and speed advantages of 10210^2103×10^3\times. Their ability to generalize to unseen resolutions enables downstream tasks such as Bayesian inversion and uncertainty quantification with drastic computational savings.

Nonetheless, FNOs inherit a fundamental smoothness bias from Fourier-mode truncation, which inhibits recovery of non-dominant/high-frequency features in solutions with sharp gradients, strong nonlinearities, or stochastic forcing. This is intrinsic to their spectral parameterization and regularization scheme.

6. Extensions, Hybrid Architectures, and Practical Guidelines

Recent work has focused on addressing FNO limitations by:

  • Hybridization with local convolutional or differential/integral kernels, such as the Differential/Integral FNO (Liu-Schiaffini et al., 26 Feb 2024) and Conv-FNO (Liu et al., 22 Mar 2025), to inject locality and capture fine-scale features that global Fourier layers miss.
  • Model-parallel and distributed implementations for large-scale, high-dimensional problems, with benchmarked weak and strong scaling efficiency (II et al., 2022).
  • Spectral-adaptive and incremental curriculum approaches (e.g., iFNO (George et al., 2022)) which incrementally grow the spectral support and data resolution to optimize learning dynamics and mitigate overfitting.
  • Frequency-aware losses to counter FNO's low-frequency bias and preserve energy at all relevant bands.
  • Universal architecture constructions, particularly for micromechanics with FFT-based solvers, yielding mesh-independent surrogates guaranteed to be as expressive as classical approaches (Nguyen et al., 16 Jul 2025).

Recommended best practices from recent large-scale implementations include:

  • Select the spectral cutoff mm to retain \geq95% of energy in the training data spectrum.
  • Tune the number of layers (typically $3$–$6$) and channel width (e.g., $16$–$128$) for the PDE's complexity.
  • Employ smooth activation functions (CsC^s with s>d/2s>d/2) and periodic positional encoding to control discretization error.
  • Use explicit or implicit regularization (mode truncation, spectral norm penalty), but recognize expressivity is ultimately limited by spectral support.

7. Outlook and Future Directions

FNOs mark a major advance in mesh-free, scalable learning of solution operators for parametric and stochastic PDEs. Future research is directed at:

  • Overcoming spectral smoothness bias by combining global and local operators.
  • Extending FNOs to non-rectangular geometries and complex boundary conditions.
  • Adapting spectral support dynamically for multiscale problems.
  • Incorporating uncertainty quantification and inverse design in a manner consistent with FNO's operator-theoretic framework.
  • Bridging FNOs with classical solver theory for provably optimal surrogates in scientific computing (Duruisseaux et al., 1 Dec 2025, Nguyen et al., 16 Jul 2025).

FNOs' scalability, resolution invariance, and strong theoretical and empirical foundations have made them a cornerstone in scientific machine learning for computational physics, engineering, and beyond (Li et al., 2020, II et al., 2022, Duruisseaux et al., 1 Dec 2025, Qi et al., 8 Sep 2025).

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Fourier Neural Operators (FNO).