Neural Operators: FNO, AFNO, CNO, UNO

Updated 4 March 2026

Neural Operators are deep learning architectures that learn mappings between infinite-dimensional function spaces, enabling resolution-invariant surrogate modeling of PDEs.
FNO, AFNO, CNO, and UNO use distinct approaches—from global Fourier transforms to adaptive and U-shaped designs—to efficiently capture complex physical dynamics.
These operators facilitate mesh-free computation and zero-shot generalization, making them critical tools for scientific simulations and engineering applications.

Neural Operators (FNO, AFNO, CNO, UNO)

Neural operators are deep learning architectures designed to learn mappings between infinite-dimensional function spaces, rather than finite-dimensional vectors. They have become central in scientific machine learning, particularly for surrogate modeling of partial differential equations (PDEs), scientific signal processing, and surrogate modeling in physical and engineering systems. Unlike traditional neural networks, neural operators parameterize solution operators for PDEs in a discretization-invariant and mesh-free manner, enabling zero-shot generalization across resolutions and domains, fast computational surrogacy, and efficient handling of high-dimensional input spaces. This article focuses on the principal families of neural operators, with detailed attention to the Fourier Neural Operator (FNO), Adaptive Fourier Neural Operator (AFNO), Convolutional Neural Operator (CNO), and U-shaped Neural Operator (UNO), as well as theoretical underpinnings and domain-specific advancements.

1. Foundations of Neural Operator Architectures

Neural operators target the approximation of nonlinear, nonlocal mappings between function spaces, $\mathcal{G}: \mathcal{U} \to \mathcal{V}$ , where $\mathcal{U}, \mathcal{V}$ are Banach or Hilbert spaces of real- or complex-valued functions defined on domains such as $\Omega \subset \mathbb{R}^d$ (Duruisseaux et al., 1 Dec 2025, Xiao et al., 6 Oct 2025, Lanthaler et al., 2023). Classical PDE solvers obtain $\mathcal{G}(f) = u$ via mesh-based discretizations and numerical integration, but these are computationally demanding and lack generalization across mesh or parameter changes. Neural operators instead realize a parameterized family $\mathcal{G}_\theta$ capable of learning this mapping end-to-end from data, exhibiting:

Resolution Independence: The learned operator acts on functional inputs and is deployable at unseen grid resolutions without retraining.
Mesh-Free Parametrization: Architectures are agnostic to the mesh, relying on basis-expansion (e.g., Fourier, Chebyshev, wavelets) or operator-theoretic kernels for discretization invariance (Duruisseaux et al., 1 Dec 2025).
Global Receptive Fields: Critical for long-range PDE couplings and physical phenomena such as turbulence, wave propagation, and complex boundary value problems.

The foundational universality proof demonstrates that for operator learning, both nonlocality and nonlinearity are essential: local nonlinear pointwise networks cannot approximate translation, and purely global linear networks are insufficient for nonlinear PDE solution operators (Lanthaler et al., 2023).

2. Fourier Neural Operator (FNO): Architecture and Properties

The Fourier Neural Operator (FNO) introduced the use of parameterized global convolutions in the frequency domain as the principal neural operator for PDEs and function-to-function mapping (Duruisseaux et al., 1 Dec 2025, Xiao et al., 6 Oct 2025, Kim et al., 2022).

Architecture: Each FNO layer for an input field $u(x)$ is given by

$u_{t+1}(x) = \sigma\Bigl(W u_t(x) + \mathcal{F}^{-1}\big(R \odot \mathcal{F}(u_t)\big)(x)\Bigr),$

where

$W$ is a local $(1\times1)$ convolution/operator,
$\mathcal{F}$ / $\mathcal{U}, \mathcal{V}$ 0 are (multichannel) Fast Fourier Transforms (FFT/IFFT) along spatial dimensions,
$\mathcal{U}, \mathcal{V}$ 1 is a learnable complex tensor, applying a channel-mixing linear map to each retained set of low-frequency Fourier modes, and
$\mathcal{U}, \mathcal{V}$ 2 is a pointwise nonlinearity (e.g., GELU, ReLU).

Only a finite number of modes per dimension (typically $\mathcal{U}, \mathcal{V}$ 3– $\mathcal{U}, \mathcal{V}$ 4, well below Nyquist frequency) is retained as parameters; all other modes are zeroed, conferring resolution invariance (Duruisseaux et al., 1 Dec 2025, Xiao et al., 6 Oct 2025). FNO stacks $\mathcal{U}, \mathcal{V}$ 5 such layers, with initial and final pointwise MLPs for input lifting and output projection.

Properties:

Resolution Invariance: Truncated Fourier expansion and fixed $\mathcal{U}, \mathcal{V}$ 6 allow deployment on any discretization finer than that used for training.
Computational Complexity: Each Fourier layer costs $\mathcal{U}, \mathcal{V}$ 7 per $\mathcal{U}, \mathcal{V}$ 8 grid points, $\mathcal{U}, \mathcal{V}$ 9 modes, $\Omega \subset \mathbb{R}^d$ 0 channels, and $\Omega \subset \mathbb{R}^d$ 1 dimensions.
Universal Approximation: For translation-invariant nonlinear operators, FNOs are universal, but only require a minimal nonlocal ingredient (even single-mode averaging achieves universality in ANO) (Lanthaler et al., 2023).
Generalization and Capacity: The capacity, as measured by group-norms of $\Omega \subset \mathbb{R}^d$ 2 and number of modes, tightly controls generalization error; increasing $\Omega \subset \mathbb{R}^d$ 3 improves expressivity, but with diminishing returns and risk of overfitting (Kim et al., 2022).

FNOs have demonstrated superior performance for mesh-free, global surrogate modeling, including electromagnetic channel modeling for next-generation MIMO (Xiao et al., 6 Oct 2025), Navier–Stokes turbulence, and quantum system surrogacy (Qi et al., 8 Sep 2025).

3. Adaptive and Specialized Variants: AFNO, CNO, UNO

AFNO: Adaptive Fourier Neural Operator

AFNO introduces three principal modifications to FNO for greater adaptation and efficiency, especially for high-dimensional vision and heterogenous physical domains (Guibas et al., 2021):

Block-Diagonal Channel Mixing: Instead of full $\Omega \subset \mathbb{R}^d$ 4 spectral mixing per mode, AFNO uses $\Omega \subset \mathbb{R}^d$ 5 channel blocks, dramatically reducing the parameter count.
Shared Adaptive Filters: Spectral weights are produced per mode by a shared two-layer complex MLP ("token-mixing MLP"), enabling input-adaptive mixing.
Soft-Thresholding in Frequency: Element-wise shrinkage (LASSO-like sparsification) prunes uninformative modes, enforcing sparse spectral representations and reducing overfitting and computational cost.

This yields $\Omega \subset \mathbb{R}^d$ 6 parameter complexity and $\Omega \subset \mathbb{R}^d$ 7 compute, matching FNO scaling but at lower overhead, with per-sample adaptation of mixing. Empirically, AFNO matches or outperforms local and attention-based architectures in few-shot segmentation and vision tasks at reduced cost.

CNO: Convolutional Neural Operator

CNO replaces global spectral convolution with a learnable spatially localized convolutional kernel, typically with small support, parameterized as CNN layers shared across the domain (Xiao et al., 6 Oct 2025). CNO architectures are suitable for heterogeneous or strongly non-periodic domains.

Trade-offs: CNO excels when local features dominate, but lacks the inherent global receptive field and mesh-free resolution transfer of FNO/AFNO. Deep stacks are required for long-range coupling.

UNO: U-shaped Neural Operator

UNO architectures leverage U-Net style encoder–decoder pathways, combining multi-scale feature extraction with spectral (Fourier, wavelet) convolutions at every scale (Xiao et al., 6 Oct 2025). This allows hierarchical, resolution-agnostic learning and enhanced robustness in multi-scale physical systems, at the cost of increased parameter count and tuning complexity. UNO enables partial adaptation to non-periodic and irregular domains.

Comparison Table:

Operator	Global Field	Mesh-Free	Adaptive Modes	Best For
FNO	✓	✓	No	Periodic/wavelike physics
AFNO	✓	✓	Yes	Heterogeneous/image domains
CNO	✗	✗	N/A	Local, non-periodic operators
UNO	✓	✗	Partially	Multi-scale phenomena

(Xiao et al., 6 Oct 2025)

4. Extensions, Expressivity, and Universality

Expressivity Beyond Spectral Bottlenecks:

FNO and its variants are maximally efficient for translation-invariant (convolutional) or spectrally sparse operators, but can exhibit a "pure-spectral bottleneck" for general position-dependent or nonlinear operators, due to non-decaying Fourier tails and exponential parameter scaling with approximation accuracy (Lee et al., 20 Sep 2025).
Kolmogorov–Arnold Neural Operators (KANO) alleviate this by using dual bases (spectral + spatial), permitting high-fidelity learning of position-dependent dynamics, polynomial scaling in error, full closed-form symbolic interpretability, and robust OOD generalization. KANO outperforms FNO by several orders of magnitude in empirical benchmarks, and uniquely allows direct extraction of symbolic operator coefficients (Lee et al., 20 Sep 2025).

Hybrid Approaches:

Augmentation of FNO with local, resolution-invariant layers—e.g., differential and compactly supported integral operators—retains mesh-free super-resolution while substantially reducing smoothing and improving small-scale accuracy. FNOs enhanced with such localized branches achieve 34–72% error reduction on turbulent flow and geophysical PDEs (Liu-Schiaffini et al., 2024).

Boundary-to-Domain Operator Learning:

Lifting Product FNOs (LP-FNO) resolve the challenge of accepting boundary-only data and recovering bulk solutions by lifting boundary embeddings via lower-dimensional FNOs, forming their tensor product across the full spatial domain, and achieving robust zero-shot super-resolution, OOD generalization, and resolution independence unattainable by padding or purely domain-based FNOs (Kashi et al., 2024).

Invertible and Bi-Directional FNOs:

iFNO introduces invertible Fourier coupling blocks in latent space, enabling a single parameter set to handle both forward and inverse problems. Coupling with a $\Omega \subset \mathbb{R}^d$ 8-VAE improves ill-posedness regularization and allows uncertainty quantification; iFNO achieves lower errors and memory use than training separate forward and inverse FNOs (Long et al., 2024).

5. Practical Implementation, Training, and Applications

Parameterization and Training:

FNO and derivatives employ 4–6 layers, hidden width 64–128 (channels), and 8–40 Fourier modes per spatial dimension. Training uses MSE or relative $\Omega \subset \mathbb{R}^d$ 9 loss, Adam optimizer with $\mathcal{G}(f) = u$ 0 initial learning rate, and moderate batch sizes (16–128) (Duruisseaux et al., 1 Dec 2025, Xiao et al., 6 Oct 2025, Liu-Schiaffini et al., 2024).
A standard FNO can be implemented as a stack of spectral convolution blocks, each comprising FFT, learnable truncated spectral multipliers, IFFT, and a pointwise MLP, with MLP-based lifting and projection.

Resolution Invariance:

Due to the spectral parameterization, FNO and most variants generalize across discretizations; LP-FNO explicitly demonstrates zero-shot super-resolution and robust performance even when trained only at coarse resolutions (Kashi et al., 2024). Local-augmented FNOs maintain this property for both differential and local integral layers.

Application Domains:

FNO and its variants have shown state-of-the-art results in: electromagnetic field and channel modeling for massive MIMO and metasurfaces (Xiao et al., 6 Oct 2025), simulation of quantum dynamical systems (Floquet Hamiltonians, operator dynamics, quantum information spreading) (Qi et al., 8 Sep 2025), S-matrix phase learning with integrated regression–classification (Niarchos et al., 2024), PDE surrogate modeling, and uncertainty-aware inverse problems (Long et al., 2024).

6. Comparative Empirical Performance and Open Challenges

Selected head-to-head benchmarks:

FNO vs. AFNO, CNO, UNO: FNO outperforms AFNO, CNO, and UNO in strict periodic, spectral, and global operator regimes with the lowest param count; AFNO and UNO close the gap for visual and multiscale tasks or when adaptivity and hierarchical feature extraction are critical (Guibas et al., 2021, Xiao et al., 6 Oct 2025).
Local-augmented FNOs: Addition of localized integral and differential kernels reduces $\mathcal{G}(f) = u$ 1 errors by 34–72% on turbulent and geophysical PDEs compared to vanilla FNO (Liu-Schiaffini et al., 2024).
FNO vs. KANO: For position-dependent operators or non-translationally invariant PDEs, KANO achieves $\mathcal{G}(f) = u$ 2 lower error and symbolic recovery of operator structure, where FNO's error explodes with increasing operator complexity (Lee et al., 20 Sep 2025).
iFNO: A single bi-directional operator halves or betters both forward and inverse error, and significantly reduces parameter count versus separate or prior invertible architectures (Long et al., 2024).

Open challenges:

FNOs are most effective for spectrally sparse, periodic problems and may fail for highly non-stationary, position-dependent, or nonlocal operators. Generalization to complex topologies, boundaries, or unstructured meshes remains active research.
Integration of adaptive basis selection, efficient hybridization with local methods, and domain-specific inductive biases (e.g., Hilbert analytic transforms in HNO (Pordanesh et al., 6 Aug 2025)) are promising avenues for further expressivity and efficiency.
Hardware-centric optimizations, quantization, and hybrid operator frameworks for real-time and embedded applications in science and engineering remain underdeveloped (Xiao et al., 6 Oct 2025).

7. Theoretical Perspectives and Universality

Theoretical results demonstrate that nonlocality (even in the form of a single global average, as in ANO) and nonlinearity are sufficient for universal operator approximation on compact sets, and that the rank of the nonlocal kernel determines the minimal architectural ingredients for universality (Lanthaler et al., 2023). Pure spectral FNOs suffice for translation-invariant problems; for dense and rich operator classes, dual-basis (spatial+spectral) or adaptive basis architectures are necessary for tractability (Lee et al., 20 Sep 2025). In practice, a moderate number of Fourier modes ( $\mathcal{G}(f) = u$ 3– $\mathcal{G}(f) = u$ 4 in 2D, per empirical studies) plus nonlinear channel mixing achieves the best tradeoff between accuracy and capacity for fixed parameter budgets (Duruisseaux et al., 1 Dec 2025, Kim et al., 2022).

References:

(Duruisseaux et al., 1 Dec 2025) Fourier Neural Operators Explained: A Practical Perspective
(Xiao et al., 6 Oct 2025) Learning Function-to-Function Mappings: A Fourier Neural Operator for Next-Generation MIMO Systems
(Liu-Schiaffini et al., 2024) Neural Operators with Localized Integral and Differential Kernels
(Kashi et al., 2024) Learning the boundary-to-domain mapping using Lifting Product Fourier Neural Operators for partial differential equations
(Lee et al., 20 Sep 2025) KANO: Kolmogorov-Arnold Neural Operator
(Guibas et al., 2021) Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers
(Lanthaler et al., 2023) Nonlocality and Nonlinearity Implies Universality in Operator Learning
(Kim et al., 2022) Bounding the Rademacher Complexity of Fourier neural operators
(Long et al., 2024) Invertible Fourier Neural Operators for Tackling Both Forward and Inverse Problems
(Qi et al., 8 Sep 2025) Fourier Neural Operators for Time-Periodic Quantum Systems: Learning Floquet Hamiltonians, Observable Dynamics, and Operator Growth
(Niarchos et al., 2024) Learning S-Matrix Phases with Neural Operators
(Pordanesh et al., 6 Aug 2025) Hilbert Neural Operator: Operator Learning in the Analytic Signal Domain