Operator Neural Networks (ONNs) Overview

Updated 24 March 2026

ONNs are neural architectures that replace standard linear operations with heterogeneous, learnable nonlinear operators, greatly increasing representational capacity.
They encompass self-organized variants (Self-ONNs) and operator learning paradigms like DeepONet, which efficiently model complex mappings in scientific computing and pattern recognition.
Empirical results demonstrate that ONNs yield significant improvements in tasks such as image restoration, fault diagnosis, and biomedical signal classification, offering robust generalization.

Operational Neural Networks (ONNs) are a class of neural architectures designed to generalize and surpass traditional convolutional and fully connected networks by introducing flexible, heterogeneous, and often learnable nonlinear operators at the neuron or connection level. ONNs enable the direct modeling and learning of complex local or function-to-function mappings ("operators") relevant in both scientific computing and pattern analysis. Several ONN paradigms have been developed, ranging from patch-wise Taylor-expansion–based generative neurons (Self-ONNs) and library-based heterogeneous neurons (classic ONNs), to function-space operator networks such as DeepONet, RBON, and their domain-specific and ensemble extensions.

1. Operator Neural Networks: Definitions and Generalization

ONNs extend conventional neural architectures by replacing the standard linear weight/activation pipeline with a more general triple: nodal operator (nonlinear function of the weight and input), pool operator (aggregation), and activation. In ONNs, the pre-activation of a neuron is typically

$x_i^{l+1} = b_i^{l+1} + P_i^{l+1}\{Y_i^{l+1}(w_{ij}^{l+1}, y_j^{l}) : j=1,\ldots,N_l\},$

where $Y$ is a (potentially nonlinear) nodal operator, $P$ is a pool (e.g., sum, median), and $f$ is the activation function (Kiranyaz et al., 2019). This generalization enables heterogeneous operator assignment at the level of each neuron or even each connection, vastly increasing representational capacity compared to the homogeneous, linear convolutional neurons found in CNNs. ONNs subsume CNNs as the case $Y(w, y) = w y,\, P = \sum,\, f = \mathrm{ReLU}$ .

Broadly, ONNs fall into two main families:

Patch-wise ONNs: Operate on grid-structured data (images, time series); generalize convolutional layers via flexible nodal/pool/activation operators, either chosen from a discrete "operator library" (Malik et al., 2020) or learned "on-the-fly" via Taylor expansions (Self-ONN) (Kiranyaz et al., 2020, Ince et al., 2021).
Operator learning ONNs: Learn maps between function spaces, as in DeepONet or RBON, central in scientific machine learning for approximating PDE solution operators (Kobayashi et al., 2023, Kurz et al., 2024, Patel et al., 2022, Goswami et al., 2022).

2. Self-Organized ONNs (Self-ONNs) and Generative Neurons

Self-ONNs eliminate the operator library and associated search by equipping each connection with a learnable, locally nonlinear function parameterized by a Taylor (Maclaurin) series:

$u(w, y) = \sum_{q=0}^{Q-1} w^{(q)}\, y^{q}$

where $w^{(q)}$ are learned coefficients and $Q$ is the expansion order (Kiranyaz et al., 2020, Ince et al., 2021). Each operational or convolutional kernel is thus a tensor of shape (input channels, output channels, spatial dims, Q), allowing each connection to synthesize its optimal local operator during training.

Key properties:

Maximum heterogeneity: Every connection, kernel element, and order has independent learnable parameters.
Self-organization: No pre-specified function library; all operator coefficients are updated by standard backpropagation.
Computational tractability: Forward propagation can be decomposed into Q parallel standard convolutions, making Self-ONNs efficiently vectorizable and compatible with existing GEMM/BLAS/conv backends (Malik et al., 2021, Malik et al., 2021).

Empirically, Self-ONNs yield significant performance improvements over both classic ONNs and CNNs—up to 8% higher F₁ in severe bearing fault diagnosis (Ince et al., 2021), 1–3 dB better PSNR in restoration/denoising (Malik et al., 2020, Malik et al., 2021), and state-of-the-art compact biomedical classifiers (Devecioglu et al., 2021).

3. Operator Learning Architectures: DeepONet, RBON, and Extensions

Operator learning ONNs aim to learn mappings between function spaces (G: 𝒰 → 𝒱). The seminal DeepONet (Kobayashi et al., 2023, Goswami et al., 2022, Lee et al., 2023) approximates

$G(u)(y) \approx \sum_{i=1}^p b_i(u) t_i(y)$

where $b_i(u)$ (branch net) encodes the input function (e.g., sampled at m points), and $t_i(y)$ (trunk net) encodes the output location. Universal approximation theorems guarantee convergence for continuous G as $m,\,p \rightarrow \infty$ (Kobayashi et al., 2023, Sharma et al., 2024, Lee et al., 2023).

Recent extensions include:

Ensemble and Mixture-of-Experts DeepONet: Multiple trunks (global, local, or data-driven) are combined, allowing the network to simultaneously model global modes and localized features. Mixture-of-Experts variants use partitions of unity to blend local trunks, introducing spatial sparsity and improving steep-gradient resolution (Sharma et al., 2024).
RBON/NRBON/F-RBON: The Radial Basis Operator Network replaces the branch and trunk nets by RBF layers:

$G(u)(x) = \sum_{i,k} \xi_i^k\, g(\|u^m - \mu_{ik}^m\|)\,g(\|x - x_k\|),$

with K-means–clustered centers and spreads. Exact linear algebraic fitting gives near machine-precision accuracy and better out-of-distribution robustness compared to DeepONet and FNO (Kurz et al., 2024).

Variationally Mimetic Operator Networks (VarMiON): Architectures reflecting the variational/Galerkin structure of the underlying PDE, splitting the network into basis-construction (trunk) and coefficient-assembly (branch) modules, which enhances data efficiency and interpretability (Patel et al., 2022).

4. Training, Optimization, and Operator Search Strategies

Methods to assign or learn operator sets in ONNs include:

Greedy Iterative Search (GIS): For classic ONNs using a finite operator library, GIS trains multiple times per layer to select the best operator set. This approach is computationally demanding and restricts heterogeneity to the layer level (Kiranyaz et al., 2019).
Synaptic Plasticity Monitoring (SPM): SPM evaluates the "plasticity" (change in weight variance) of each operator set assigned to neurons in random or biased runs, ranking operator sets based on their dynamic contribution to learning. This enables the construction of "elite" ONNs with high intra-layer heterogeneity and has been shown to yield further gains over GIS ONNs and CNNs on restoration, synthesis, and transformation tasks (Kiranyaz et al., 2020, Malik et al., 2020).
Self-organization (Self-ONN): Backpropagation directly tunes all Taylor coefficients per connection; no outer search is required. Empirically, Self-ONN converges an order of magnitude faster than CNN or ONN (Kiranyaz et al., 2020).
Two-step training for DeepONet: Decouples trunk (basis) and branch (coefficient) training, orthonormalizing the trunk basis for improved generalization and optimization stability (Lee et al., 2023).

5. Performance Benchmarks and Applications

ONNs unlock improved expressivity, computational efficiency, and adaptability across domains:

Image and signal processing: Self-ONNs and SPM-ONNs consistently outperform same-sized CNNs and even deeper state-of-the-art networks (DnCNN, BM3D), especially in compact, shallow settings and under severe noise models. For high-noise AWGN, 2-layer Self-ONNs approach or surpass BM3D with up to 1.3 dB PSNR gain (Malik et al., 2021). In biomedical contexts, compact Self-ONNs achieve 100% F₁ in glaucoma detection (ESOGU) and 99.1% F₁ in ECG peak detection, with orders of magnitude lower parameter count and real-time inference (Devecioglu et al., 2021, Gabbouj et al., 2021).
Operator learning for scientific computing: DeepONet and its variants (ensemble, PoU, RBON, VarMiON) demonstrate rapid convergence and superior generalization on PDE surrogate tasks (Darcy flow, lid-driven cavity, beam equation, Burgers equation, Allen-Cahn). Ensembles can yield 4× lower errors over standard DeepONet, and RBON achieves $10^{-7}$ L² error in/out-of-distribution (Sharma et al., 2024, Kurz et al., 2024, Kobayashi et al., 2023, Lee et al., 2023, Patel et al., 2022).
Physics–informed learning: Energy-dissipative DeepONet and variationally mimetic ONNs incorporate PDE structure (energy dissipation, weak formulation) into training for stronger inductive bias, guaranteed stability, and better OOD and limited-data performance (Zhang et al., 2023, Patel et al., 2022).
Computational cost: Despite increased complexity per neuron (e.g., Q polynomials in Self-ONN), most implementations execute Q parallel convolutions, yielding practical inference speeds competitive with CNN baselines—often with dramatic parameter and data efficiency [(Ince et al., 2021, Devecioglu et al., 2021), FastONN in (Malik et al., 2020)].

6. Theoretical Guarantees, Expressivity, and Limitations

Universal approximation: DeepONet, RBON (incl. normalized and frequency versions), and classical ONNs are all provably universal approximators for nonlinear operators between function spaces, given sufficient network width/depth or RBF basis size (Kurz et al., 2024, Kobayashi et al., 2023, Kiranyaz et al., 2019).
Expressivity: ONNs embed nonlinear transformations at the kernel, connection, or neuron level. Self-ONNs realize a continuous, data-driven operator family, while classic ONNs rely on operator choice (potentially suboptimal if the library is not rich enough). Empirical ablations support that higher-order Taylor expansion (Q ≳ 5) offers the best tradeoff before overfitting or diminishing returns (Malik et al., 2020).
Robustness and generalization: DeepONet and RBON demonstrate strong zero-shot generalization; ensemble/moe DeepONets and RBONs maintain low error on OOD tasks, while basic DeepONet can overfit (Kurz et al., 2024, Sharma et al., 2024).
Limitations: The main drawbacks are parameter growth with Q/order, potential for overfitting, and, in some ONNs, reliance on operator libraries or empirical operator search (in GIS/ONN). Adaptive order selection and further operator-space regularization remain active topics. Taylor-based generative neurons may poorly approximate non-polynomial operator classes in some settings (Kiranyaz et al., 2020).
Computation and implementation: With appropriate vectorization and GPU backends, Self-ONNs and FastONN achieve efficient batch inference (see (Malik et al., 2020)).

7. Outlook and Algorithmic Variants

Active research directions include:

Operator enrichment: Stacked or hybrid trunk and branch networks (POD, RBF, tails, interpretable modes) to better match multiscale or localized operator structure (Sharma et al., 2024, Kurz et al., 2024).
Adaptive/sparse operator design: Partition-of-unity and mixture-of-experts methods for local attention, adaptivity, and scalability without a global parameter increase (Sharma et al., 2024).
Physics-informed and mimetic architectures: VarMiON, EDE-DeepONet, and PINOs for improved stability, interpretability, and accuracy under physical constraints (Zhang et al., 2023, Patel et al., 2022, Goswami et al., 2022).
Hyperparameter adaptation: Selection of Taylor order Q per layer or neuron, regularization for overfitting control, and reinforcement of synaptic plasticity assignment during learning (Kiranyaz et al., 2020, Kiranyaz et al., 2020).
Software frameworks: GPU-efficient implementations (FastONN) and modifiable operator-set libraries for rapid prototyping and experimentation (Malik et al., 2020).

ONNs thus represent a comprehensive and flexible neural modeling paradigm that spans from locally nonlinear signal-processing networks to universal operator learners, combining theoretical rigor, empirical superiority, broad applicability, and extensible implementation (Kiranyaz et al., 2019, Kiranyaz et al., 2020, Ince et al., 2021, Kurz et al., 2024, Sharma et al., 2024, Kobayashi et al., 2023, Patel et al., 2022).