Papers
Topics
Authors
Recent
Search
2000 character limit reached

Universal Approximation Theorem for Operators

Updated 7 March 2026
  • Universal Approximation Theorem for Operators defines conditions for approximating any continuous operator between infinite-dimensional spaces using deep neural architectures.
  • The theorem utilizes truncated-input encoding with register and decoder mechanisms to achieve uniform approximation with minimal network width.
  • Depth in these networks generates exponential expressive power, underpinning architectures like DeepONets and Fourier Neural Operators in scientific machine learning.

A Universal Approximation Theorem (UAT) for operators provides rigorous conditions under which neural-network-like architectures can approximate any continuous (possibly nonlinear) operator between infinite-dimensional function spaces, uniformly on compact subsets. Such results generalize the classical UAT for neural networks—originally concerning functions f:RdRf : \mathbb{R}^d \to \mathbb{R}—to the operator-valued context G:VC(Y)G : V \to C(Y), where VV is a compact set of input functions and YY is a compact subset of Rn\mathbb{R}^n. The study of universal operator approximation is foundational for operator learning in scientific machine learning and underpins architectures such as DeepONets, Fourier Neural Operators (FNO), and their variants.

1. Formal Operator Neural Network and Problem Setting

Let XX be a compact subset of a Banach space, VC(X)V \subset C(X) a compact set of real-valued continuous functions (“inputs”), and YRnY \subset \mathbb{R}^n a compact output (“coordinate”) domain. The operator of interest is a mapping G:VC(Y)G: V \to C(Y), u(yG(u)(y))u \mapsto (y \mapsto G(u)(y)). The goal is to construct, for any ϵ>0\epsilon > 0, a neural architecture FF such that

supuV,yYG(u)(y)F(u(x1),,u(xm),y)<ϵ\sup_{u \in V,\, y \in Y} |G(u)(y) - F(u(x_1), \ldots, u(x_m), y)| < \epsilon

for suitable sensor points x1,...,xmXx_1, ..., x_m \in X. An Operator Neural Network (ONN) is thus a fully-connected feedforward network mapping Rm+nR\mathbb{R}^{m+n} \to \mathbb{R}, implemented as FF, which may be demanded to have prescribed width and arbitrary depth, or vice versa, depending on the theorem variant (Yu et al., 2021).

A truncated-input ONN includes an additional input processing layer that encodes the mm real inputs u(xj)u(x_j) as a single number with finite decimal representation, a key technical device for the most stringent universal approximation results.

2. Main Universal Approximation Theorems for ONNs

2.1 Non-Polynomial Activations: Width Five

Let σ:RR\sigma: \mathbb{R} \to \mathbb{R} be continuous, non-polynomial, and C1C^1 at some α\alpha with σ(α)0\sigma'(\alpha) \neq 0. Then, for every ϵ>0\epsilon > 0, there exist mNm \in \mathbb{N} and sampling points x1,...,xmXx_1, ..., x_m \in X, and an ONN F:Rm+nRF: \mathbb{R}^{m+n} \rightarrow \mathbb{R} of arbitrary depth and width 5 (with truncated inputs) such that

supuV,yYG(u)(y)F(u(x1),...,u(xm),y)<ϵ\sup_{u \in V,\, y \in Y} |G(u)(y) - F(u(x_1), ..., u(x_m), y)| < \epsilon

This theorem asserts that width can be held at 5 while depth is allowed to be arbitrary, under mild smoothness conditions on σ\sigma (Yu et al., 2021).

2.2 Non-Affine Polynomial Activations

For polynomial (degree 2\ge2) but non-affine activation functions σ\sigma, the same result holds with width 6. If σ(α)=0\sigma'(\alpha) = 0 but σ(α)0\sigma''(\alpha) \neq 0 at some α\alpha, width 5 again suffices. Thus, a wide class of smooth activations admit universal operator approximation with strictly bounded width (Yu et al., 2021).

3. Construction Principles and Architectural Mechanisms

The realization of operator UAT in the “arbitrary-depth, bounded-width” setting exploits several key mechanisms:

  • Wide-shallow UAT (Chen et al.): Any continuous operator on compacta can be uniformly approximated by a network expressible as a finite sum over terms factorized into sensor-activated (branch) and location-activated (trunk) subnetworks—formally the structure underlying DeepONet (Lu et al., 2019).
  • Truncated-input encoding: Each real input is truncated to κ\kappa decimal digits. All sensor values are encoded into a single “register” number rr via

r=j=1m10jκ10κu(xj)r = \sum_{j=1}^m 10^{-j\kappa} \lfloor 10^\kappa u(x_j) \rfloor

  • Register-compute and decoder maps: Decoders ϕj\phi_j reconstruct the jjth truncated block from rr, so each u(xj)u(x_j) can be approximately extracted at each layer as needed. Implementations rely on C1C^1 activation regularity for “carry-through” identity approximation and multipliers.
  • Minimal width: A width-5 architecture is organized as (1) register neuron; (2) two decoders; (3) one for local affine computation; (4) one “augmenter” that performs final summation. This minimal configuration achieves universality for all continuous nonlinear operators on compact sets for the prescribed class of activations (Yu et al., 2021).

4. Depth-Separation Results: ReLU Operator Networks

A rigorous depth-separation theorem demonstrates exponential gap between deep constant-width and shallow wide ReLU ONNs:

  • For any k1k \ge 1, there exists a continuous operator Gk:VC(Y)G_k: V \to C(Y) such that:
    1. It can be computed exactly by a ReLU ONN of depth 2k3+82k^3 + 8, width O(1)O(1).
    2. Any ReLU ONN of depth k\le k and at most 2k2^k neurons must incur error at least $1/64$ on some input, for some uVu \in V.

This adapts Telgarsky’s sawtooth-function construction to the operator-valued setting, showing that certain operators are not well-approximated by shallow ReLU networks of subexponential width (Yu et al., 2021).

5. Architectural and Practical Implications

  • Constant-width (e.g., width 5 or 6) ONNs with deep architectures are universal for operator learning; “more depth, less width” architectures suffice.
  • Depth generates exponential expressive power: families of operators exist that deep, narrow ONNs can compute exactly but shallow, wide ones cannot approximate except at exponential cost in width.
  • In practical operator learning for scientific machine learning, deep, narrow ONNs are preferred for complex nonlinear operator classes.
  • The results provide explicit theoretical backing for DeepONets, FNOs, and related architectures, clarifying their universality and the role of depth-vs-width trade-offs in practice (Yu et al., 2021).

6. Relation to Other Operator UATs and Future Directions

The approach in (Yu et al., 2021) builds on, and extends, the classical operator UAT established by Chen & Chen (1995) and the first DeepONet architectures, which allow for arbitrary width and bounded depth (Lu et al., 2019). By combining truncation-based encoding with register-compute mechanisms and controlled decoder extraction, the width constraint is reduced to its theoretical minimum, and the impact of depth is sharply quantified.

Contemporary directions include analysis for other neural operator architectures (e.g., Enc-Dec schemes, FNOs, transformer-based operator approximators) and quantitative scaling laws for width, depth, and parameter-efficiency under varying operator regularity and input domain complexity. Open problems include optimal encoding schemes beyond decimal truncation, depth-width trade-offs for other activation classes, and extensions to random-input and measure-theoretic operator settings.


References:

(Yu et al., 2021) Arbitrary-Depth Universal Approximation Theorems for Operator Neural Networks (Lu et al., 2019) DeepONet: Learning Nonlinear Operators for Identifying Differential Equations Based on the Universal Approximation Theorem of Operators

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Universal Approximation Theorem for Operators.