Error Bounds for Fourier Neural Operators

Updated 10 February 2026

Fourier Neural Operators are neural architectures that map infinite-dimensional function spaces, particularly for solving PDEs.
Error bounds for FNOs measure approximation, discretization, and statistical errors, with decay rates tied to Fourier modes, grid size, and sample complexity.
Rigorous error analyses combining spectral theory and capacity control guide network design to achieve accurate scientific operator learning.

Fourier Neural Operators (FNOs) are a principal class of neural operator architectures for approximating mappings between infinite-dimensional function spaces, particularly those governed by partial differential equations (PDEs). A central theoretical question concerns quantitative error bounds: how well can an FNO approximate a target operator, and how do approximation, discretization, and generalization errors scale with network size, regularity, sample size, and discretization parameters? The study of error bounds for FNOs thus integrates results from spectral theory, nonparametric learning, and operator approximation. Recent research has crystallized rigorous error analyses, including parametric approximation bounds, discretization/aliasing estimates, sample complexity lower bounds, and generalization error via capacity control.

1. Universal Approximation and Parametric Error Bounds

Fourier Neural Operators were shown to possess universal approximation capability on function spaces. Given suitable smoothness assumptions, for any continuous operator $G:H^s\to H^{s'}$ and compact $K\subset H^s$ , there exists an FNO $N$ such that the sup-norm error $\sup_{a\in K}\|G(a)-N(a)\|_{H^{s'}}$ is arbitrarily small provided the architecture parameters (number of Fourier modes, channel width, depth) are chosen sufficiently large and the activation is smooth non-polynomial (Kovachki et al., 2021). For practical PDE solution operators (e.g., Darcy flow, Navier–Stokes), the error decays algebraically in the number of Fourier modes $N$ :

Stationary Darcy: $\sup_{a\in A^s_\lambda}\|G(a)-N(a)\|_{H^1}\le C N^{-k}$ for $a$ of Sobolev regularity $s>d/2+k$ , $k>1$ , with network size sublinear in $1/\varepsilon$ .
Navier–Stokes: analogously, with $N^{-r}$ decay for initial data in $H^r$ , $r>d/2+2$ .

Consequently, for operators admitting additional regularity and spectral decay, FNOs achieve error-vs-size scaling superior to that of generic neural architectures (Kovachki et al., 2021).

2. Discretization and Aliasing Error

As FNOs are implemented in discrete settings, aliasing error from grid-based FFT computation must be quantified. On the $d$ -torus $\mathbb T^d$ with grid size $N$ and under $H^s$ -regularity for $s>d/2$ , the discretization (aliasing) error for FNOs with smooth activations satisfies (Lanthaler et al., 2024):

$\|v_t^N - v_t\|_{\ell^2(\mathcal X_N)} \le C\,N^{-s}$

for all layers $t$ , with $C$ polynomial in network parameters. The exponent $s$ marks the input/activation regularity. The bound persists for interpolated outputs:

$\|v_t - \mathcal I_N[v_t^N]\|_{L^2(\mathbb T^d)} \le C' N^{-s}.$

Smooth activations (e.g. GELU) are essential; for ReLU activations, the convergence rate is throttled by the regularity of the activation (Lanthaler et al., 2024).

3. Decomposition of Total Error: Truncation, Discretization, Statistical

Operator learning theory for FNOs identifies three principal sources of error (Subedi et al., 2024):

Truncation error (finite Fourier expansion): upper bound $O(K^{-2s})$ for truncation at mode $K$ .
Discretization error (grid aliasing): upper bound $O(N^{-s})$ for grid size $N$ .
Statistical error (finite sample size): upper bound $O(n^{-1/2})$ for $n$ i.i.d. samples.

A combined excess-risk bound is:

$\mathcal E_n(\widehat T_K^N,T,\mu) \le 8 B^2 (C+1)^2 \left(\tfrac1{\sqrt n} + \tfrac{2^s\sqrt{\pi^d}}{N^s} + \tfrac1{K^{2s}}\right)$

where $B$ bounds the $H^s$ -norm of data, and $C$ bounds the operator weights. Lower bounds match the exponents for $K$ and $N$ (truncation/discretization) but not for $n$ (statistical), where a quadratic gap persists between $O(n^{-1/2})$ upper and $O(n^{-1})$ lower. This decomposition provides a precise framework for targeting the limiting factor in overall accuracy (Subedi et al., 2024).

4. Approximation Rates via Symbol Learning and Fréchet Metrics

When the solution operator $G:H^s\to H^t$ is a Fourier multiplier, network approximation rates for its symbol $\sigma$ in suitable seminorms transfer to operator error. If a network family $\Sigma_N$ approximates $\sigma$ at rate $N^{-\alpha}$ in Sobolev or Hörmander seminorms, then the composed FNO achieves output error $\|G(u)-G_N(u)\|_{H^t} \le K N^{-\alpha} \|u\|_{H^s}$ for all $u\in H^s$ (Abdeljawad et al., 2024). In exponential spectral Barron spaces and Paley–Wiener spaces, this yields rates such as $O(e^{-cN^{\beta/d}})$ (for exponentially localized symbols) or $O(N^{-1/2})$ (for bandlimited functions), depending on the underlying function space (Abdeljawad et al., 2024).

5. Universal Approximation of Derivatives and Operator Sensitivities

Derivative-informed FNOs (DIFNOs) extend error bounds to Fréchet derivatives of the learned operator. For target operators $G\in C^1(X;Y)$ with continuous Fréchet derivative $DG$ , for any $\varepsilon>0$ and compact subset $K\subset X$ , there exists an FNO $N$ such that

$\sup_{a\in K} \|G(a)-N(a)\|_Y \le \varepsilon, \quad \sup_{a\in K} \|DG(a)-DN(a)\|_{HS(X_\delta,Y)} \le \varepsilon$

with $X,H^s$ , $Y,H^{s'}$ , and $X_\delta = H^{s+\delta}$ for some $\delta\ge0$ (Yao et al., 16 Dec 2025). This analysis demonstrates that truncation to $N$ modes produces $O(N^{-r})$ decay in both operator and derivative error when $G$ is $r$ -times differentiable, and the overall error can be made arbitrarily small by increasing network dimensions.

6. Generalization Bounds and Rademacher Complexity

The generalization error for FNOs has been characterized in terms of empirical Rademacher complexity, which depends on layerwise $(p,q)$ -norms of the weights and the count of retained Fourier modes. The Rademacher complexity for FNOs with capacity $\gamma_{p,q}(h)$ satisfies (Kim et al., 2022):

$\mathcal R_m\big(\{h : \gamma_{p,q}(h)\le\gamma\}\big) \le \gamma L^D (NH)^{D\lfloor 1/p^*-1/q \rfloor_+} H^{\lfloor1/p^*-1/q\rfloor_+} N^{d_u/p}\frac1m\sum_i\|a_i\|_{p^*}$

where $L$ is the activation Lipschitz constant, $D$ the depth, $k_{\max}$ the number of Fourier modes per layer, and $N$ the grid size. The generalization gap scales with these quantities, and empirical studies confirm the predicted correlation between generalization error and network capacity $\gamma_{p,q}$ , as well as a direct dependence on the number of Fourier modes (Kim et al., 2022).

7. Sampling Complexity and the Theory-to-Practice Gap

Despite fast parametric approximation rates, FNOs are affected by a fundamental sampling complexity limit. For any class of functions $G$ that can be approximated at rate $O(n^{-\alpha})$ by an FNO of complexity $n$ , the minimax risk for data-driven learning from $N$ input–output pairs is at best $O(N^{-1/p})$ in the Bochner $L^p$ -norm, independent of $\alpha$ (Grohs et al., 23 Mar 2025). In the uniform norm, no algebraic rate is achievable. This “theory-to-practice gap” restricts practical accuracy in the data-driven setting, despite the high expressivity of kernel-based operator classes (Grohs et al., 23 Mar 2025).

Error Source	Rate (Upper Bound)	Main Dependence
Parametric approximation	$O(N^{-k})$	Number of modes $N$ , regularity $k$ , network size
Discretization (aliasing)	$O(N^{-s})$	Grid size $N$ , input/activation Sobolev regularity $s$
Truncation (spectral cut-off)	$O(K^{-2s})$	Mode truncation $K$ , data $H^s$ -norm bound
Statistical error	$O(n^{-1/2})$	Number of i.i.d. samples $n$ , boundedness of operator/data
Generalization gap	$\propto$ $\gamma_{p,q}$	Product of weight norms, number of modes, grid size, network depth
Minimax rate (learning)	$O(N^{-1/p})$	Number of samples $N$ , output norm $(L^p)$

Further research is clarifying intermediate settings—such as non-periodic domains, ill-posed inverse problems, or non-smooth symbol classes—as well as closing the remaining statistical gap in sample complexity and exploring robust training in the presence of discretization, noise, and architectural sensitivity. The current bounds collectively give a quantitative foundation for the rigorous deployment and analysis of FNOs in high-dimensional scientific operator learning.