Neural Operators: Function-to-Function Learning
- Neural operators are architectures that learn mappings between infinite-dimensional function spaces, particularly for approximating PDE solutions.
- Models like DeepONet and Fourier Neural Operators employ branch networks and global Fourier methods to achieve resolution-independent learning.
- Recent advances underscore their universal approximation capabilities, efficient error bounds, and scalability for multi-operator tasks.
Neural operators, or operator networks, are architectures that learn mappings between infinite-dimensional function spaces, typically arising as solution operators to partial differential equations (PDEs) and related parametric models. Originating from the intersection of machine learning and scientific computing, neural operators generalize the capabilities of standard deep neural networks—traditionally designed for finite-dimensional input and output—to learn surrogates for operators, enabling mesh-independent, resolution-agnostic, and highly efficient approximations of complex scientific processes. The field encompasses foundational models such as Deep Operator Networks (DeepONet), Fourier Neural Operators (FNO), graph-based neural operators, and numerous advanced variants targeting geometric generalization, multi-tasking, and resolution independence.
1. Formal Foundations and Operator-Learning Problem
Neural operators are formulated as parametric maps between Banach (or Hilbert) spaces of functions, most commonly
where and are spaces of functions on (possibly distinct) domains, such as and or and . The classical motivating example is a PDE solution operator, , with a coefficient field or boundary condition and the PDE solution (Kovachki et al., 2024, Kovachki et al., 2021).
Given a dataset of function pairs 0, the goal is to train a neural approximation 1 such that 2 accurately predicts 3 for a wide range of 4. Uniquely, the parameters 5 must be independent of discretization, and the learned map must generalize to any representation or mesh of the underlying function spaces (Berner et al., 12 Jun 2025).
2. Architectures: DeepONet, Fourier Neural Operator, and Beyond
The two canonical architectures are Deep Operator Networks (DeepONet) and Fourier Neural Operators (FNO):
- DeepONet: Introduced by Lu, Jin, Karniadakis, DeepONet splits operator approximation into a branch network, which encodes the input function as a finite set of sensor readings, and a trunk network, which encodes the query location:
6
where 7 is the input sampled at 8 sensor points, 9 and 0 are outputs of the branch and trunk nets, respectively (Goswami et al., 2022, Kovachki et al., 2024, Jha, 7 Mar 2025).
- Fourier Neural Operator: FNO parameterizes layers via global convolutions in Fourier space. Each layer updates feature fields 1 as:
2
where 3 are learned spectral multipliers and 4 denotes the (discrete) Fourier transform. FNO supports mesh-independent transfer and super-resolution (Kovachki et al., 2021, Berner et al., 12 Jun 2025).
- Graph, Wavelet, and Other Variants: Graph Neural Operators (GNO) extend kernel integration to non-Euclidean and unstructured domains using message-passing on graphs. Wavelet Neural Operators employ localized basis functions suitable for multi-resolution learning (Kovachki et al., 2024, Li et al., 2020).
- Multi-input and Multi-task Operators: Extensions such as MIONet generalize to operators on product spaces, admitting multiple functional arguments with tensor-product branch networks (Jin et al., 2022), and MODNO enables multi-operator learning by distributing input encoding and output bases to achieve efficiency gains (Zhang, 2024).
- Resolution-Independent Architectures: Recent work overcomes the constraint of fixed input sensor locations in DeepONet by introducing offline dictionary learning of continuous basis functions, resulting in architectures such as RI-DeepONet and RINO that operate on arbitrary, potentially irregular, point clouds (Bahmani et al., 2024).
3. Approximation Theory, Universality, and Complexity
Neural operators possess a rich theory of universal approximation, error rates, and complexity bounds:
- Universal Approximation: DeepONet, FNO, and related architectures are universal in the sense that, given sufficient network width/depth and appropriate nonlinearity, they can approximate any continuous operator between separable Banach spaces, uniformly on compact sets (Kovachki et al., 2024, Kovachki et al., 2021, Chen et al., 21 May 2026). In the FNO case, the universality extends to uniform approximation over Sobolev spaces for periodic domains when the number of modes and depth is large (Theorem 3.3 in (Kovachki et al., 2024)).
- Curse of Parametric Complexity: For generic operators characterized only by 5 or Lipschitz regularity, sample and parameter complexity can grow exponentially in required accuracy, an "infinite-dimensional analogue of the curse of dimensionality." Polynomial approximation rates exist only for special structures (e.g., holomorphic solution maps) (Lanthaler et al., 2023, Kovachki et al., 2024).
- Sampling Complexity and the Theory-to-Practice Gap: Neural operators, even with infinite parametric expressivity, cannot learn mappings from infinite-dimensional input data (e.g., functions) faster than the Monte Carlo rate. In particular, for operator learning tasks, the best-possible sample convergence rate in 6 is 7, and uniform convergence in 8-norm is sub-algebraic or impossible; this quantifies a fundamental theory-to-practice gap in deep learning for operator approximation. The samplings rates cannot exceed 9 (in 0) even with super-polynomial parametric rates (Theorem 3.2 and Theorem 6.4 in (Grohs et al., 23 Mar 2025)).
- Technical Advances: Adaptations such as localized integral and differential kernels, as in (Liu-Schiaffini et al., 2024), combine global Fourier modeling with local stencils to improve capture of sharp features, reducing generalization error by 34–87% on benchmark PDEs.
| Theoretical Bound | Regime | Reference |
|---|---|---|
| 1 | 2 on 3 | (Grohs et al., 23 Mar 2025) |
| Sub-algebraic, 4 | Uniform norm, 5 | (Grohs et al., 23 Mar 2025) |
| Exponential in 6 | Generic 7 operators | (Lanthaler et al., 2023) |
| Polynomial (best-8-term, holomorphic) | Regular parametric PDEs | (Kovachki et al., 2024) |
4. Algorithmic and Training Strategies
Practical implementation of neural operators incorporates several algorithmic considerations:
- Loss Functions: Standard training minimizes empirical risk in 9 or relative norm. Physics-informed variants impose PDE residual constraints as soft penalties, reducing the need for labeled data and improving OOD generalization (Goswami et al., 2022, Zhang et al., 6 Nov 2025, Jha, 7 Mar 2025).
- Mesh-independence: Integral kernel and Fourier/spectral layers are constructed to commute with mesh refinement, making inference and training agnostic to discretization or sensor set (Kovachki et al., 2021, Berner et al., 12 Jun 2025).
- Optimization: Batched Adam/AdamW optimizers with learning rate scheduling are commonly adopted. For temporal or recurrent operators, auto-regressive training is standard (Bahmani et al., 2024, Jha, 7 Mar 2025).
- Multi-operator and Multi-task Training: Distributed setups (MODNO) share encoder branches across operators, training model components on full and local data to enable scalable multi-operator learning at no extra cost per operator. Data efficiency is improved for underrepresented operators due to shared representations (Zhang, 2024).
- Adaptation and Meta-learning: Permutation-invariant architectures, such as SetONet, enable rapid task adaptation for multi-task control settings, supporting a continuum from last-layer fine-tuning to full meta-training using MAML-type procedures (SeWell et al., 3 Apr 2026).
5. Applications and Empirical Performance
Neural operators provide mesh-independent surrogates for complex physical models across disciplines:
- Parametric PDE Solving: Rapid mapping from varying coefficients, boundary conditions, or geometric parameters to PDE solutions in fluid dynamics (Navier–Stokes, Burgers), subsurface flow (Darcy), elasticity, and more. Neural operator surrogates routinely reduce solution time by factors of 0–1 relative to finest-grid conventional solvers (Zhang et al., 6 Nov 2025, Kovachki et al., 2024, Kovachki et al., 2021, Jha, 7 Mar 2025).
- Uncertainty Quantification (UQ) and Bayesian Inference: Neural operators accelerate UQ via fast surrogate evaluations in MCMC, with sub-1% posterior bias compared to full solvers (Jha, 7 Mar 2025).
- Operator Inversion and Multiphysics Composition: Neural operators integrate into inverse design, optimal control, and design optimization, supporting multi-query and multi-physics scenarios through compositional and transfer learning (Goswami et al., 2022, SeWell et al., 3 Apr 2026).
- Geometric Generalization: Multiscale kernel approaches generalize to variable and nonparametric domains by learning geometry-dependent integral kernels and handling point clouds, normals, and domain-specific features, attaining robust accuracy across unseen or topologically different geometries (Han et al., 2 Feb 2026).
- Function Interpolation: TFNOs and similar architectures, when recast with a base-space, can act as highly parameter-efficient interpolators for high-dimensional structured regression tasks, matching or outperforming MLPs and specialized function-network architectures (Niarchos et al., 8 May 2026).
6. Challenges, Limitations, and Research Directions
The field recognizes fundamental and practical limitations:
- Curse of Complexity: For generic, non-structured operator families, neural operator parameter and sample complexity remains intrinsically exponential in required accuracy. Addressing this curse requires exploiting problem-specific structures (holomorphy, low effective dimension, characteristic factorization) or developing tailored architectures (e.g. HJ-Net for Hamilton–Jacobi equations) (Lanthaler et al., 2023).
- Sampling Limits: No operator-learning algorithm using pointwise evaluation can exceed the 2 Monte Carlo rate in 3 for infinite-dimensional data. In the uniform norm, no algebraic convergence is achievable for infinite-dimensional input spaces (Grohs et al., 23 Mar 2025).
- Generalization and OOD Risk: Quantifying and guaranteeing generalization, especially for out-of-distribution geometry, parameters, or initial conditions, is an ongoing theoretical and empirical challenge (Kovachki et al., 2024, Zhang et al., 6 Nov 2025).
- Adaptation and Scalability: Efficient multi-operator and multi-task adaptation, as well as scaling to extremely high-dimensional function spaces and time-dependent multiphysics, remain active research areas. Approaches combining meta-learning with permutation-invariant operator architectures are promising (Zhang, 2024, SeWell et al., 3 Apr 2026).
- Theory–Practice Translation: Despite provable universality, sample and parameter complexity gaps remain between best possible and observed empirical rates; closing this gap in theory and practice is a focus (Grohs et al., 23 Mar 2025).
- Numerical Stability and Physics Consistency: Incorporating physics-informed constraints, enforcing conservation, and adapting network design to honor domain symmetries or invariances is crucial for robust scientific deployment (Goswami et al., 2022, Jha, 7 Mar 2025).
7. Software Ecosystem and Practical Tools
Efficient deployment, benchmarking, and model development is supported by robust open-source frameworks:
- NeuralOperator Library: Provides out-of-the-box implementations for FNO, GNO, TFNO, SFNO, DeepONet, and their combinatorial variants, supporting true resolution-agnostic learning with guaranteed discretization convergence. The framework incorporates modular layers, dataset support, and meta-algorithm integration, facilitating both applied work and research (Kossaifi et al., 2024).
- Mesh-Independence and API Design: APIs are designed to allow input and output on arbitrary discretizations, with internal abstractions enforcing mesh-independence throughout training and inference pipelines. All benchmarking, hyperparameter optimization, and integrating physics-informed penalties are supported in standard workflows.
- Performance: Benchmarks consistently show operators achieving sub-1% relative 4 error on Darcy and Navier–Stokes problems at inference speeds orders-of-magnitude faster than traditional solvers, with stable accuracy across unseen resolutions and geometries.
Neural operators represent a mature and theoretically well-founded paradigm for learning function-to-function maps, unifying mathematical universality, functional approximation, and scalable deep learning. Ongoing research continues to advance the state of the art in complexity management, geometric and resolution generalization, multi-operator scalability, and robust physics-informed learning (Kovachki et al., 2024, Grohs et al., 23 Mar 2025, Han et al., 2 Feb 2026, Zhang, 2024, Lanthaler et al., 2023, Kossaifi et al., 2024).
References