Convolutional Neural Operator Overview

Updated 6 April 2026

Convolutional Neural Operators are neural architectures that use convolutional layers to approximate mappings between infinite-dimensional spaces while ensuring resolution invariance.
They integrate techniques such as anti-aliasing filters, U-Net designs, and rigorous theoretical guarantees to achieve high accuracy and data efficiency in surrogate PDE tasks.
CNOs enable efficient gradient-based inversion and transfer learning, significantly reducing computational time compared to traditional full-waveform inversion methods.

A Convolutional Neural Operator (CNO) is a class of neural operator architectures designed to learn mappings between infinite-dimensional function spaces using convolutional neural network (CNN) structures, with rigorous guarantees of resolution invariance and alias-free approximation. CNOs have proved effective for surrogate modeling of partial differential equation (PDE) solution operators, offering practical advantages in accuracy, efficiency, and data efficiency compared to global spectral or attention-based neural operators. Recent advances have further enhanced the flexibility of CNOs for transfer learning, inversion, and multiscale modeling.

1. Mathematical Definition and Theoretical Foundations

Formally, a CNO seeks to learn a mapping $\mathcal{G}: \mathcal{X} \to \mathcal{U}$ between function spaces—most commonly, from PDE coefficients or initial conditions to solutions—where input and output spaces $\mathcal{X}, \mathcal{U}$ are typically chosen as Sobolev spaces $H^r(D;\mathbb{R}^d)$ on a domain $D \subset \mathbb{R}^d$ (Raonić et al., 2023). In the generic neural operator framework, the L-layer CNO architecture is defined by iteratively lifting the input to feature space, applying convolutional integral operators interleaved with pointwise nonlinearities, and finally projecting to the target function space:

$\begin{align*} v_0(x) &= P[a(x)] \ v_{l+1}(x) &= \sigma \left( W_l v_l(x) + \int_D \kappa_l(x, y) v_l(y) \, dy \right),\quad l=0,\ldots,L-1 \ u(x) &= Q[v_L(x)] \end{align*}$

where $P$ and $Q$ are local, channel-wise linear maps ("lifting" and "projection"), $W_l$ is a local linear operator, and $\kappa_l(x, y)$ is a learnable kernel (often implemented as a spatial convolution). Nonlinearity $\sigma(\cdot)$ is typically ReLU or GELU. CNOs crucially enforce translation equivariance (shift invariance) and employ anti-aliasing filters so that their continuous and discrete versions remain commutative—a property termed continuous-discrete equivalence (CDE) (Raonić et al., 2023, Fan et al., 19 Dec 2025).

Rigorous approximation bounds demonstrate that suitably parameterized CNOs are universal approximators of operators between function spaces, achieving arbitrary accuracy for band-limited functions and maintaining control over spatial and parametric error as a function of regularity and network width/depth (Franco et al., 2022, Raonić et al., 2023). Convolutional layers' translation-equivariance and structured locality are shown to be theoretically equivalent, under certain constructions, to truncated local Fourier series interpolation.

2. Architectural Design and Implementation

The practical instantiation of a CNO is typically a deep U-Net or encoder-decoder with residual connections and skip links, constructed as follows (Fan et al., 19 Dec 2025, Ma et al., 24 Sep 2025):

Input lifting (P): A $\mathcal{X}, \mathcal{U}$ 0 convolution expands the input to a high-dimensional feature space.
Encoder: Sequential blocks of $\mathcal{X}, \mathcal{U}$ 1 convolutions (with batch normalization and ReLU/GELU), spatial downsampling via anti-aliased pooling. The encoder can use deep backbones such as ResNet-101 (Ma et al., 24 Sep 2025).
Bottleneck: Central layers operate at the coarsest resolution, typically two $\mathcal{X}, \mathcal{U}$ 2 convolutions with high channel count.
Decoder: Symmetric upsampling path (e.g., bilinear upsampling followed by $\mathcal{X}, \mathcal{U}$ 3 convolutions) with skip connections from matching encoder stages (U-Net structure).
Projection (Q): A $\mathcal{X}, \mathcal{U}$ 4 convolution projects final features to the output function value at each spatial point.

To preserve band-limits and prevent aliasing, all downsampling operations employ explicit low-pass filtering, and upsampling relies on mathematically principled sinc-based filters or bilinear interpolation. Residual links and multi-scale skip connects are frequently used to facilitate learning of both global structure and high-frequency content. Channel and spatial attention modules may be embedded within skip connections to enhance feature selection, especially in complex inversion tasks (Ma et al., 24 Sep 2025).

A more advanced variant, the Dilated Convolutional Neural Operator (DCNO), extends CNOs by interleaving Fourier layers (to capture low-frequency global structure) with dilated convolutional layers (to model local high-frequency details), achieving state-of-the-art Pareto optimality in cost-accuracy for multiscale PDEs (Xu et al., 2024).

3. Training Protocols and Data Regimes

CNOs are trained as end-to-end surrogates for operator learning tasks:

Data generation: Synthetic datasets are usually used with known solutions for forward PDE tasks, e.g., velocity model to seismic image (Ma et al., 24 Sep 2025), or initial condition to final state for timed evolution.
Loss function: Standard objectives include mean squared error (MSE) or relative $\mathcal{X}, \mathcal{U}$ 5/ $\mathcal{X}, \mathcal{U}$ 6 error computed on the function outputs. For inversion setups, a data fidelity term plus regularization such as total variation is employed (Ma et al., 24 Sep 2025).
Optimization: Adam (or AdamW) with learning rate scheduling. Early stopping based on validation set error is common. Batch sizes are determined by available GPU resources.
Resolution invariance: Due to structure-preserving design, a CNO trained at one grid resolution can be applied at another without retraining (Raonić et al., 2023).

For transfer learning between PDE regimes, CNOs have been adapted with three strategies: (1) fine-tuning only decoder weights, (2) low-rank adaptation (LoRA) to convolutional kernels, and (3) neuron linear transformation (NLT), where convolutional kernels are rescaled and shifted channel-wise. NLT achieves the highest surrogate accuracy and data efficiency for few-shot scenario adaptation, with empirical sub-1% relative error even when only 16 target-domain samples are available (Fan et al., 19 Dec 2025).

4. Inverse Problems and Differentiable Inversion

Once trained, the CNO acts as a differentiable surrogate for forward operators, enabling efficient, gradient-based inverse problem solutions. For example, in seismic velocity inversion (Ma et al., 24 Sep 2025):

The CNO is embedded within an inversion loop where the velocity model $\mathcal{X}, \mathcal{U}$ 7 is iteratively updated to minimize

$\mathcal{X}, \mathcal{U}$ 8

where $\mathcal{X}, \mathcal{U}$ 9 is the frozen CNO, $H^r(D;\mathbb{R}^d)$ 0 is the observed RTM image, and $H^r(D;\mathbb{R}^d)$ 1 is a TV regularization coefficient.

Gradients $H^r(D;\mathbb{R}^d)$ 2 are computed via automatic differentiation through the neural operator, sidestepping traditional adjoint-state solvers.
In practice, high-frequency corrections to the background velocity can be efficiently injected, and deployment to field data is immediate due to mesh-independence and robust generalization.

This framework enables inversion with orders-of-magnitude reduction in compute time compared to classical waveform inversion (e.g., 15-44 s per survey versus 2.5 hours for full-waveform inversion) (Ma et al., 24 Sep 2025).

5. Empirical Performance and Benchmarks

Extensive empirical studies demonstrate that CNOs achieve leading performance across a range of PDE benchmarks, including:

Task	CNO Rel. Error (%)	FNO Rel. Error (%)	U-Net/GT Error (%)	Reference
Poisson (2D, multiscale)	0.21	4.98	0.71 (U-Net)	(Raonić et al., 2023)
Navier-Stokes (2D, shear)	2.76	3.57	3.54 (U-Net)	(Raonić et al., 2023)
Seismic RTM (field)	<5 (L₂ error, forward); inversion yields correlation ≳0.9, 15-44s runtime	-	-	(Ma et al., 24 Sep 2025)
Kuramoto-Sivashinsky	0.18	0.97	1.40 (DeepONet)	(Fan et al., 19 Dec 2025)
Multiscale elliptic (DCNO)	0.531	1.749	1.159 (U-NO), 0.556 (HANO)	(Xu et al., 2024)

CNOs display robustness to out-of-distribution data, data efficiency (favorable test error scaling with sample size), and are resolution-invariant. Hybrid variants (e.g., DCNO) substantially outperform both pure spectral and pure convolutional models for multiscale phenomena, reducing both bias towards global low modes and truncation errors in high-frequency content (Xu et al., 2024).

6. Theoretical Guarantees and Implications

CNOs inherit rigorous approximation properties from their convolutional and band-limited operator foundations:

For sufficiently regular solution manifolds (e.g., $H^r(D;\mathbb{R}^d)$ 3), CNOs achieve uniform error $H^r(D;\mathbb{R}^d)$ 4 with $H^r(D;\mathbb{R}^d)$ 5 depth, $H^r(D;\mathbb{R}^d)$ 6 width (for parametric regularity $H^r(D;\mathbb{R}^d)$ 7), and $H^r(D;\mathbb{R}^d)$ 8 channels (Franco et al., 2022, Raonić et al., 2023).
Analyses reveal a deep equivalence between convolutional blocks and local Fourier interpolation, with CNN depth scaling logarithmically in output resolution for fixed accuracy (Franco et al., 2022).

Universal approximation theorems for CNOs are proven under minimal conditions and demonstrate their capacity to approximate continuous, translation-invariant solution operators at arbitrary precision within the bandlimit (Raonić et al., 2023, Fan et al., 19 Dec 2025). The construction ensures commutativity of continuous and discrete forms in all layers.

7. Applications, Extensions, and Limitations

CNOs are being utilized for surrogate modeling, inverse problems, multiscale PDEs, and data-driven seismic inversion. Notable applications include:

Seismic velocity model building with true field generalization, high inversion correlation, and significant speedup over traditional workflows (Ma et al., 24 Sep 2025).
Multiscale PDE learning, including elliptic, Helmholtz, and Navier-Stokes systems, in both direct and inverse regimes (Xu et al., 2024).
Few-shot PDE operator transfer via NLT kernels with only $H^r(D;\mathbb{R}^d)$ 9 adaptation parameters (Fan et al., 19 Dec 2025).

Limitations include inherent restrictions to Cartesian domains (for current implementations), increased cost for 3D or nonuniform meshes, and open questions regarding curse-of-dimensionality in very high parametric regimes (Raonić et al., 2023). Ongoing research explores extensions to physics-informed formulations, hybrid spectral-convolutional architectures, and theoretical bounds for more general operator families.

References: (Ma et al., 24 Sep 2025, Raonić et al., 2023, Fan et al., 19 Dec 2025, Xu et al., 2024, Franco et al., 2022)