Fourier Neural Operators (FNOs)

Updated 17 December 2025

Fourier Neural Operators (FNOs) are neural architectures that learn mappings between function spaces, enabling resolution-independent approximation of PDE solutions.
They employ Fourier layers with spectral truncation to filter high-frequency noise, achieving significant computational speed-ups over traditional numerical methods.
FNOs scale to high-dimensional problems, support hybrid extensions for capturing fine-scale features, and offer theoretical guarantees on generalization and discretization errors.

Fourier Neural Operators (FNOs) are neural architectures designed for learning mappings between function spaces, most notably solution operators of partial differential equations (PDEs). Introduced as a mesh-invariant, spectral-domain parameterization, FNOs can rapidly approximate entire families of PDE solutions and are resolution-independent, delivering orders-of-magnitude speedup compared to classical numerical methods and outperforming preceding operator-learning neural architectures (Li et al., 2020, II et al., 2022, Duruisseaux et al., 1 Dec 2025).

1. Mathematical Formulation and Operator Learning

The central aim of the FNO framework is to learn a mapping (operator) $\mathcal{G}_\theta: \mathcal{A} \rightarrow \mathcal{U}$ , where $\mathcal{A}, \mathcal{U} \subset L^2(\Omega)$ are spaces of input and output functions (e.g., PDE coefficients to solution functions) (Li et al., 2020, II et al., 2022). The FNO is characterized by:

Lifting: The input function $a(x)$ is mapped pointwise to a higher-dimensional feature space via an affine transformation.
Fourier Layers:

$u_{t+1}(x) = \sigma \left[ W u_t(x) + \mathcal{F}^{-1} \left( R(k) \cdot \mathcal{F}[u_t](k) \right)(x) \right]$

where $W$ is a trainable channel-mixing matrix, $\mathcal{F}, \mathcal{F}^{-1}$ are the (inverse) discrete Fourier transforms, $R(k)$ are trainable spectral filters, and $\sigma$ is a pointwise nonlinearity (e.g., ReLU or GeLU). The spectral filter is typically truncated, i.e., $R(k) = 0$ for $|k|$ above a cutoff.

Projection: After $K$ layers, a final pointwise linear map produces the output function.
Loss: Training minimizes the relative $L^2$ error:

$L = \frac{\|u_{\text{pred}} - u_{\text{true}}\|_2}{\|u_{\text{true}}\|_2}$

Crucially, discretization invariance is achieved: once trained on a fixed grid, the operator can be evaluated at any resolution, enabling "zero-shot" super-resolution (Li et al., 2020).

2. Spectral Truncation, Regularization, and Discretization Error

FNOs employ low-rank (mode) truncation to restrict the trainable Fourier filters to the lowest $m$ modes per dimension, enacting a form of smoothness regularization which filters out high-frequency noise and acts as implicit regularization. The choice of $m$ controls the bias-variance trade-off: too small impairs expressivity, too large increases overfitting and cost (II et al., 2022, Duruisseaux et al., 1 Dec 2025). Each Fourier layer thus costs $O(m^d N \log N)$ as opposed to $O(N^d \log N)$ for all modes.

Recent work has quantified the aliasing and discretization errors induced by grid evaluations and mode truncation (Lanthaler et al., 3 May 2024). If the input has Sobolev regularity $s$ and activation functions are $C^s$ -smooth, the $L^2$ grid error decays as $O(N^{-s})$ , provided activation and positional encoding preserve smoothness. Discretization error thus becomes negligible compared to model approximation error when $N$ is set such that $N^{-s} \lesssim \varepsilon$ , for target error $\varepsilon$ .

3. Model-Parallel Scalability and Algorithmic Structure

To scale FNOs to billions of variables, model-parallel FNOs distribute the spatial grid and network tensors over many workers/GPUs (II et al., 2022). The distributed Fourier transforms (DFFT) are implemented via all-to-all communication patterns that realign data layout for local FFTs along each axis, followed by local application of the trainable filters and inverse transforms. The model parameters corresponding to pointwise operations are redundantly stored or broadcast; all communication is compatible with backpropagation and automatic differentiation frameworks.

On systems such as NERSC Perlmutter (512 A100 GPUs), strong and weak scaling with up to $2.6$ billion variables achieved over $80\%$ parallel efficiency. Inference times are reduced from hundreds or thousands of seconds (for standard CPU-based PDE solvers) to $\sim$ 1 second, yielding $100\times$ – $1386\times$ speed-ups for real-world 4D multiphase CO $_2$ subsurface simulations.

4. Generalization, Capacity, and Learning Theory

The generalization properties of FNOs are governed by the architecture (mode count, depth, width) and by the $L_{p,q}$ group norms of its parameter tensors, which control the Rademacher complexity and, thus, generalization error bounds (Kim et al., 2022).

For an FNO hypothesis class parameterized so that $L_{p,q}$ capacities of all layers are bounded, the empirical excess risk is tightly controlled. Architectural hyperparameters impact the bound via:

Mode cutoff $k_{\max}$ : Higher increases expressivity but raises capacity.
Depth $D$ : Amplifies the effect through a power law, $k_{\max}^{D/p^*}$ .
Group-norm balancing: Small $(p, q) \approx 1-2$ emphasize weight scale; large $(p, q)=\infty$ capture architecture/mode count sensitivity.

Empirical studies demonstrate high correlation ( $\geq 0.9$ ) between generalization gap and group capacity for suitable $p,q$ . A key practical implication is to scale $k_{\max}$ with sample size as $k_{\max}^{D/p^*} \lesssim m^{1/2}$ to control overfit.

5. Applications and Limitations

FNOs have been established as state-of-the-art surrogates in large-scale parametric PDE systems, covering:

Fluid dynamics (Burgers', Navier–Stokes, Reynolds–averaged, turbulence).
Subsurface multiphase porous flow (CO $_2$ plumes, Sleipner benchmark).
Quantum time-evolution (Floquet, operator growth) (Qi et al., 8 Sep 2025).
Micromechanics/homogenization, where FNOs mimic FFT-based solvers for periodic cell problems, achieving universal operator approximation subject only to material-contrast constraints, with rigorous error bounds and grid independence (Nguyen et al., 16 Jul 2025).

In these settings, FNOs yield typical relative $L^2$ errors of $1$– $2\%$ across mesh resolutions, and speed advantages of $10^2$ – $10^3\times$ . Their ability to generalize to unseen resolutions enables downstream tasks such as Bayesian inversion and uncertainty quantification with drastic computational savings.

Nonetheless, FNOs inherit a fundamental smoothness bias from Fourier-mode truncation, which inhibits recovery of non-dominant/high-frequency features in solutions with sharp gradients, strong nonlinearities, or stochastic forcing. This is intrinsic to their spectral parameterization and regularization scheme.

6. Extensions, Hybrid Architectures, and Practical Guidelines

Recent work has focused on addressing FNO limitations by:

Hybridization with local convolutional or differential/integral kernels, such as the Differential/Integral FNO (Liu-Schiaffini et al., 26 Feb 2024) and Conv-FNO (Liu et al., 22 Mar 2025), to inject locality and capture fine-scale features that global Fourier layers miss.
Model-parallel and distributed implementations for large-scale, high-dimensional problems, with benchmarked weak and strong scaling efficiency (II et al., 2022).
Spectral-adaptive and incremental curriculum approaches (e.g., iFNO (George et al., 2022)) which incrementally grow the spectral support and data resolution to optimize learning dynamics and mitigate overfitting.
Frequency-aware losses to counter FNO's low-frequency bias and preserve energy at all relevant bands.
Universal architecture constructions, particularly for micromechanics with FFT-based solvers, yielding mesh-independent surrogates guaranteed to be as expressive as classical approaches (Nguyen et al., 16 Jul 2025).

Recommended best practices from recent large-scale implementations include:

Select the spectral cutoff $m$ to retain $\geq$ 95% of energy in the training data spectrum.
Tune the number of layers (typically $3$–$6$) and channel width (e.g., $16$–$128$) for the PDE's complexity.
Employ smooth activation functions ( $C^s$ with $s>d/2$ ) and periodic positional encoding to control discretization error.
Use explicit or implicit regularization (mode truncation, spectral norm penalty), but recognize expressivity is ultimately limited by spectral support.

7. Outlook and Future Directions

FNOs mark a major advance in mesh-free, scalable learning of solution operators for parametric and stochastic PDEs. Future research is directed at:

Overcoming spectral smoothness bias by combining global and local operators.
Extending FNOs to non-rectangular geometries and complex boundary conditions.
Adapting spectral support dynamically for multiscale problems.
Incorporating uncertainty quantification and inverse design in a manner consistent with FNO's operator-theoretic framework.
Bridging FNOs with classical solver theory for provably optimal surrogates in scientific computing (Duruisseaux et al., 1 Dec 2025, Nguyen et al., 16 Jul 2025).

FNOs' scalability, resolution invariance, and strong theoretical and empirical foundations have made them a cornerstone in scientific machine learning for computational physics, engineering, and beyond (Li et al., 2020, II et al., 2022, Duruisseaux et al., 1 Dec 2025, Qi et al., 8 Sep 2025).