Error Bounds for Fourier Neural Operators
- Fourier Neural Operators are neural architectures that map infinite-dimensional function spaces, particularly for solving PDEs.
- Error bounds for FNOs measure approximation, discretization, and statistical errors, with decay rates tied to Fourier modes, grid size, and sample complexity.
- Rigorous error analyses combining spectral theory and capacity control guide network design to achieve accurate scientific operator learning.
Fourier Neural Operators (FNOs) are a principal class of neural operator architectures for approximating mappings between infinite-dimensional function spaces, particularly those governed by partial differential equations (PDEs). A central theoretical question concerns quantitative error bounds: how well can an FNO approximate a target operator, and how do approximation, discretization, and generalization errors scale with network size, regularity, sample size, and discretization parameters? The study of error bounds for FNOs thus integrates results from spectral theory, nonparametric learning, and operator approximation. Recent research has crystallized rigorous error analyses, including parametric approximation bounds, discretization/aliasing estimates, sample complexity lower bounds, and generalization error via capacity control.
1. Universal Approximation and Parametric Error Bounds
Fourier Neural Operators were shown to possess universal approximation capability on function spaces. Given suitable smoothness assumptions, for any continuous operator and compact , there exists an FNO such that the sup-norm error is arbitrarily small provided the architecture parameters (number of Fourier modes, channel width, depth) are chosen sufficiently large and the activation is smooth non-polynomial (Kovachki et al., 2021). For practical PDE solution operators (e.g., Darcy flow, Navier–Stokes), the error decays algebraically in the number of Fourier modes :
- Stationary Darcy: for of Sobolev regularity , , with network size sublinear in .
- Navier–Stokes: analogously, with decay for initial data in , .
Consequently, for operators admitting additional regularity and spectral decay, FNOs achieve error-vs-size scaling superior to that of generic neural architectures (Kovachki et al., 2021).
2. Discretization and Aliasing Error
As FNOs are implemented in discrete settings, aliasing error from grid-based FFT computation must be quantified. On the -torus with grid size and under -regularity for , the discretization (aliasing) error for FNOs with smooth activations satisfies (Lanthaler et al., 2024):
for all layers , with polynomial in network parameters. The exponent marks the input/activation regularity. The bound persists for interpolated outputs:
Smooth activations (e.g. GELU) are essential; for ReLU activations, the convergence rate is throttled by the regularity of the activation (Lanthaler et al., 2024).
3. Decomposition of Total Error: Truncation, Discretization, Statistical
Operator learning theory for FNOs identifies three principal sources of error (Subedi et al., 2024):
- Truncation error (finite Fourier expansion): upper bound for truncation at mode .
- Discretization error (grid aliasing): upper bound for grid size .
- Statistical error (finite sample size): upper bound for i.i.d. samples.
A combined excess-risk bound is:
where bounds the -norm of data, and bounds the operator weights. Lower bounds match the exponents for and (truncation/discretization) but not for (statistical), where a quadratic gap persists between upper and lower. This decomposition provides a precise framework for targeting the limiting factor in overall accuracy (Subedi et al., 2024).
4. Approximation Rates via Symbol Learning and Fréchet Metrics
When the solution operator is a Fourier multiplier, network approximation rates for its symbol in suitable seminorms transfer to operator error. If a network family approximates at rate in Sobolev or Hörmander seminorms, then the composed FNO achieves output error for all (Abdeljawad et al., 2024). In exponential spectral Barron spaces and Paley–Wiener spaces, this yields rates such as (for exponentially localized symbols) or (for bandlimited functions), depending on the underlying function space (Abdeljawad et al., 2024).
5. Universal Approximation of Derivatives and Operator Sensitivities
Derivative-informed FNOs (DIFNOs) extend error bounds to Fréchet derivatives of the learned operator. For target operators with continuous Fréchet derivative , for any and compact subset , there exists an FNO such that
with , , and for some (Yao et al., 16 Dec 2025). This analysis demonstrates that truncation to modes produces decay in both operator and derivative error when is -times differentiable, and the overall error can be made arbitrarily small by increasing network dimensions.
6. Generalization Bounds and Rademacher Complexity
The generalization error for FNOs has been characterized in terms of empirical Rademacher complexity, which depends on layerwise -norms of the weights and the count of retained Fourier modes. The Rademacher complexity for FNOs with capacity satisfies (Kim et al., 2022):
where is the activation Lipschitz constant, the depth, the number of Fourier modes per layer, and the grid size. The generalization gap scales with these quantities, and empirical studies confirm the predicted correlation between generalization error and network capacity , as well as a direct dependence on the number of Fourier modes (Kim et al., 2022).
7. Sampling Complexity and the Theory-to-Practice Gap
Despite fast parametric approximation rates, FNOs are affected by a fundamental sampling complexity limit. For any class of functions that can be approximated at rate by an FNO of complexity , the minimax risk for data-driven learning from input–output pairs is at best in the Bochner -norm, independent of (Grohs et al., 23 Mar 2025). In the uniform norm, no algebraic rate is achievable. This “theory-to-practice gap” restricts practical accuracy in the data-driven setting, despite the high expressivity of kernel-based operator classes (Grohs et al., 23 Mar 2025).
| Error Source | Rate (Upper Bound) | Main Dependence |
|---|---|---|
| Parametric approximation | Number of modes , regularity , network size | |
| Discretization (aliasing) | Grid size , input/activation Sobolev regularity | |
| Truncation (spectral cut-off) | Mode truncation , data -norm bound | |
| Statistical error | Number of i.i.d. samples , boundedness of operator/data | |
| Generalization gap | Product of weight norms, number of modes, grid size, network depth | |
| Minimax rate (learning) | Number of samples , output norm |
Further research is clarifying intermediate settings—such as non-periodic domains, ill-posed inverse problems, or non-smooth symbol classes—as well as closing the remaining statistical gap in sample complexity and exploring robust training in the presence of discretization, noise, and architectural sensitivity. The current bounds collectively give a quantitative foundation for the rigorous deployment and analysis of FNOs in high-dimensional scientific operator learning.