Papers
Topics
Authors
Recent
Search
2000 character limit reached

Error Bounds for Fourier Neural Operators

Updated 10 February 2026
  • Fourier Neural Operators are neural architectures that map infinite-dimensional function spaces, particularly for solving PDEs.
  • Error bounds for FNOs measure approximation, discretization, and statistical errors, with decay rates tied to Fourier modes, grid size, and sample complexity.
  • Rigorous error analyses combining spectral theory and capacity control guide network design to achieve accurate scientific operator learning.

Fourier Neural Operators (FNOs) are a principal class of neural operator architectures for approximating mappings between infinite-dimensional function spaces, particularly those governed by partial differential equations (PDEs). A central theoretical question concerns quantitative error bounds: how well can an FNO approximate a target operator, and how do approximation, discretization, and generalization errors scale with network size, regularity, sample size, and discretization parameters? The study of error bounds for FNOs thus integrates results from spectral theory, nonparametric learning, and operator approximation. Recent research has crystallized rigorous error analyses, including parametric approximation bounds, discretization/aliasing estimates, sample complexity lower bounds, and generalization error via capacity control.

1. Universal Approximation and Parametric Error Bounds

Fourier Neural Operators were shown to possess universal approximation capability on function spaces. Given suitable smoothness assumptions, for any continuous operator G:HsHsG:H^s\to H^{s'} and compact KHsK\subset H^s, there exists an FNO NN such that the sup-norm error supaKG(a)N(a)Hs\sup_{a\in K}\|G(a)-N(a)\|_{H^{s'}} is arbitrarily small provided the architecture parameters (number of Fourier modes, channel width, depth) are chosen sufficiently large and the activation is smooth non-polynomial (Kovachki et al., 2021). For practical PDE solution operators (e.g., Darcy flow, Navier–Stokes), the error decays algebraically in the number of Fourier modes NN:

  • Stationary Darcy: supaAλsG(a)N(a)H1CNk\sup_{a\in A^s_\lambda}\|G(a)-N(a)\|_{H^1}\le C N^{-k} for aa of Sobolev regularity s>d/2+ks>d/2+k, k>1k>1, with network size sublinear in 1/ε1/\varepsilon.
  • Navier–Stokes: analogously, with NrN^{-r} decay for initial data in HrH^r, r>d/2+2r>d/2+2.

Consequently, for operators admitting additional regularity and spectral decay, FNOs achieve error-vs-size scaling superior to that of generic neural architectures (Kovachki et al., 2021).

2. Discretization and Aliasing Error

As FNOs are implemented in discrete settings, aliasing error from grid-based FFT computation must be quantified. On the dd-torus Td\mathbb T^d with grid size NN and under HsH^s-regularity for s>d/2s>d/2, the discretization (aliasing) error for FNOs with smooth activations satisfies (Lanthaler et al., 2024):

vtNvt2(XN)CNs\|v_t^N - v_t\|_{\ell^2(\mathcal X_N)} \le C\,N^{-s}

for all layers tt, with CC polynomial in network parameters. The exponent ss marks the input/activation regularity. The bound persists for interpolated outputs:

vtIN[vtN]L2(Td)CNs.\|v_t - \mathcal I_N[v_t^N]\|_{L^2(\mathbb T^d)} \le C' N^{-s}.

Smooth activations (e.g. GELU) are essential; for ReLU activations, the convergence rate is throttled by the regularity of the activation (Lanthaler et al., 2024).

3. Decomposition of Total Error: Truncation, Discretization, Statistical

Operator learning theory for FNOs identifies three principal sources of error (Subedi et al., 2024):

  1. Truncation error (finite Fourier expansion): upper bound O(K2s)O(K^{-2s}) for truncation at mode KK.
  2. Discretization error (grid aliasing): upper bound O(Ns)O(N^{-s}) for grid size NN.
  3. Statistical error (finite sample size): upper bound O(n1/2)O(n^{-1/2}) for nn i.i.d. samples.

A combined excess-risk bound is:

En(T^KN,T,μ)8B2(C+1)2(1n+2sπdNs+1K2s)\mathcal E_n(\widehat T_K^N,T,\mu) \le 8 B^2 (C+1)^2 \left(\tfrac1{\sqrt n} + \tfrac{2^s\sqrt{\pi^d}}{N^s} + \tfrac1{K^{2s}}\right)

where BB bounds the HsH^s-norm of data, and CC bounds the operator weights. Lower bounds match the exponents for KK and NN (truncation/discretization) but not for nn (statistical), where a quadratic gap persists between O(n1/2)O(n^{-1/2}) upper and O(n1)O(n^{-1}) lower. This decomposition provides a precise framework for targeting the limiting factor in overall accuracy (Subedi et al., 2024).

4. Approximation Rates via Symbol Learning and Fréchet Metrics

When the solution operator G:HsHtG:H^s\to H^t is a Fourier multiplier, network approximation rates for its symbol σ\sigma in suitable seminorms transfer to operator error. If a network family ΣN\Sigma_N approximates σ\sigma at rate NαN^{-\alpha} in Sobolev or Hörmander seminorms, then the composed FNO achieves output error G(u)GN(u)HtKNαuHs\|G(u)-G_N(u)\|_{H^t} \le K N^{-\alpha} \|u\|_{H^s} for all uHsu\in H^s (Abdeljawad et al., 2024). In exponential spectral Barron spaces and Paley–Wiener spaces, this yields rates such as O(ecNβ/d)O(e^{-cN^{\beta/d}}) (for exponentially localized symbols) or O(N1/2)O(N^{-1/2}) (for bandlimited functions), depending on the underlying function space (Abdeljawad et al., 2024).

5. Universal Approximation of Derivatives and Operator Sensitivities

Derivative-informed FNOs (DIFNOs) extend error bounds to Fréchet derivatives of the learned operator. For target operators GC1(X;Y)G\in C^1(X;Y) with continuous Fréchet derivative DGDG, for any ε>0\varepsilon>0 and compact subset KXK\subset X, there exists an FNO NN such that

supaKG(a)N(a)Yε,supaKDG(a)DN(a)HS(Xδ,Y)ε\sup_{a\in K} \|G(a)-N(a)\|_Y \le \varepsilon, \quad \sup_{a\in K} \|DG(a)-DN(a)\|_{HS(X_\delta,Y)} \le \varepsilon

with X,HsX,H^s, Y,HsY,H^{s'}, and Xδ=Hs+δX_\delta = H^{s+\delta} for some δ0\delta\ge0 (Yao et al., 16 Dec 2025). This analysis demonstrates that truncation to NN modes produces O(Nr)O(N^{-r}) decay in both operator and derivative error when GG is rr-times differentiable, and the overall error can be made arbitrarily small by increasing network dimensions.

6. Generalization Bounds and Rademacher Complexity

The generalization error for FNOs has been characterized in terms of empirical Rademacher complexity, which depends on layerwise (p,q)(p,q)-norms of the weights and the count of retained Fourier modes. The Rademacher complexity for FNOs with capacity γp,q(h)\gamma_{p,q}(h) satisfies (Kim et al., 2022):

Rm({h:γp,q(h)γ})γLD(NH)D1/p1/q+H1/p1/q+Ndu/p1miaip\mathcal R_m\big(\{h : \gamma_{p,q}(h)\le\gamma\}\big) \le \gamma L^D (NH)^{D\lfloor 1/p^*-1/q \rfloor_+} H^{\lfloor1/p^*-1/q\rfloor_+} N^{d_u/p}\frac1m\sum_i\|a_i\|_{p^*}

where LL is the activation Lipschitz constant, DD the depth, kmaxk_{\max} the number of Fourier modes per layer, and NN the grid size. The generalization gap scales with these quantities, and empirical studies confirm the predicted correlation between generalization error and network capacity γp,q\gamma_{p,q}, as well as a direct dependence on the number of Fourier modes (Kim et al., 2022).

7. Sampling Complexity and the Theory-to-Practice Gap

Despite fast parametric approximation rates, FNOs are affected by a fundamental sampling complexity limit. For any class of functions GG that can be approximated at rate O(nα)O(n^{-\alpha}) by an FNO of complexity nn, the minimax risk for data-driven learning from NN input–output pairs is at best O(N1/p)O(N^{-1/p}) in the Bochner LpL^p-norm, independent of α\alpha (Grohs et al., 23 Mar 2025). In the uniform norm, no algebraic rate is achievable. This “theory-to-practice gap” restricts practical accuracy in the data-driven setting, despite the high expressivity of kernel-based operator classes (Grohs et al., 23 Mar 2025).


Error Source Rate (Upper Bound) Main Dependence
Parametric approximation O(Nk)O(N^{-k}) Number of modes NN, regularity kk, network size
Discretization (aliasing) O(Ns)O(N^{-s}) Grid size NN, input/activation Sobolev regularity ss
Truncation (spectral cut-off) O(K2s)O(K^{-2s}) Mode truncation KK, data HsH^s-norm bound
Statistical error O(n1/2)O(n^{-1/2}) Number of i.i.d. samples nn, boundedness of operator/data
Generalization gap \propto γp,q\gamma_{p,q} Product of weight norms, number of modes, grid size, network depth
Minimax rate (learning) O(N1/p)O(N^{-1/p}) Number of samples NN, output norm (Lp)(L^p)

Further research is clarifying intermediate settings—such as non-periodic domains, ill-posed inverse problems, or non-smooth symbol classes—as well as closing the remaining statistical gap in sample complexity and exploring robust training in the presence of discretization, noise, and architectural sensitivity. The current bounds collectively give a quantitative foundation for the rigorous deployment and analysis of FNOs in high-dimensional scientific operator learning.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Error Bounds for Fourier Neural Operators.