OP Neural Network: Data-Driven Activations

Updated 29 December 2025

OP Neural Networks are feed-forward models that learn individualized activation functions using an additive Gaussian process regression framework.
They replace conventional nonlinear optimization with a closed-form linear solve, enhancing computational efficiency and reducing overfitting.
The approach combines kernel methods and neural network expressivity to deliver robust performance in tasks like molecular potential energy surface fitting.

An "OP Neural Network" (OP NN) refers to a feed-forward neural network architecture in which the neuron activation functions are not fixed and shared but are instead learned individually for each neuron via an additive Gaussian process regression (GPR) framework. This approach, introduced in "Neural network with optimal neuron activation functions based on additive Gaussian process regression" (Manzhos et al., 2023), enables the automatic construction of neuron-specific, data-driven activations for single-hidden-layer neural networks, with training reduced to a linear solve rather than nonlinear optimization. The result is a method that robustly combines the representational power of neural networks with the flexibility and regularization of kernel methods.

1. Definition and Motivation

Let $x \in \mathbb{R}^D$ be an input and consider a fully connected single-hidden-layer neural network. Conventionally, the output is

$F(x) = \sum_{n=1}^N C_n\, o(W_n x + b_n) + b_\text{out}$

where all neurons share a fixed activation function $o(\cdot)$ (e.g., sigmoid, tanh), and the weights $\{W_n, b_n, C_n\}$ are typically learned by nonlinear optimization. OP Neural Networks drop the assumption of fixed $o(\cdot)$ and instead construct each neuron’s activation $ϕ_n(\cdot)$ optimally for the given data via GPR. The resulting expansion is

$F(x) = \sum_{n=1}^N ϕ_n(y_n)\,, \qquad y_n = W_n x + b_n$

where each $ϕ_n$ is an individually learned, data-driven, univariate function.

The motivation is that more flexible, neuron-specific activations can yield higher expressivity per neuron, reduce overfitting, and allow for compact, shallow models with strong approximation power, while avoiding the pitfalls (local minima, over-parametrization) of large, standard neural networks (Manzhos et al., 2023).

2. Additive Gaussian Process Regression Framework

Each activation $ϕ_n(\cdot)$ is modeled as a GPR function of its scalar argument $y_n$ ,

$ϕ_n(\cdot) \sim \text{GP}(0, k_n(\cdot, \cdot))$

with a chosen kernel $k_n$ (typically squared exponential or Matérn). The network output thus defines an additive GPR model on the function space:

$F(x) = \sum_{n=1}^N ϕ_n(y_n)$

For a set of $M$ training examples $(x^{(m)}, f^{(m)})$ , $m=1,\dots,M$ , one assembles the hidden variables $y^{(m)}_n=W_n x^{(m)} + b_n$ for all $n$ and $m$ , and constructs the kernel Gram matrix

$K = \sum_{n=1}^{N} K^{(n)} + \epsilon I$

with $[K^{(n)}]_{ij} = k_n(y^{(i)}_n, y^{(j)}_n)$ . The vector of predictive coefficients $\alpha$ is computed by

$\alpha = (K + \epsilon I)^{-1} f$

where $f = [f^{(1)}, ..., f^{(M)}]^T$ is the vector of observed outputs. The learned activation function for neuron $n$ is

$ϕ_n(u) = k_n(u, Y_n) \alpha$

where $Y_n = [y^{(1)}_n, ..., y^{(M)}_n]$ and $k_n(u, Y_n)$ is the kernel vector between $u$ and all training projections $Y_n$ .

3. Architecture Construction and Training Procedure

The OP NN approach circumvents nonlinear parameter optimization. Instead, it proceeds as follows:

Preprocess input data, typically standardizing to zero mean and unit variance.
Choose the number of neurons $N$ and specify the linear projections $y_n = W_n x + b_n$ . $W_n$ and $b_n$ are not learned by backpropagation but rather set by heuristic or structure—either all $D(D-1)/2$ pairwise feature averages for small $D$ , or random projections (e.g., Sobol sequence) with $b_n=0$ for arbitrary $N$ .
Construct kernel Gram matrices $K^{(n)}$ for all neurons across the dataset.
Form the complete Gram matrix $K$ and solve for $\alpha$ by linear algebra.
Define each $ϕ_n$ via the kernel interpolation.
For prediction, evaluate $F_*(x) = \sum_n ϕ_n(W_n x + b_n)$ at any new $x$ .

No iterative gradient descent is required; all optimization is closed-form for a fixed choice of projections and kernel hyperparameters.

4. Empirical Performance

The OP NN method demonstrates substantial gains in scientific regression tasks, notably molecular potential energy surface (PES) fitting for small molecules such as H₂O ( $D=3$ ) and H₂CO ( $D=6$ ):

For H₂O with $N=51$ and $M=1000$ training points, the OP NN attains test errors as low as $0.46\,\mathrm{cm}^{-1}$ , outperforming both conventional sigmoid-activated NNs and additive GPR baselines at comparable $N$ (Manzhos et al., 2023).
For H₂CO, OP NN with $N>500$ matches the accuracy of full GPR ( $\sim20\,\mathrm{cm}^{-1}$ ), but with direct neural evaluation form.
Overfitting is sharply reduced compared to standard NNs, as the flexible activations adapt directly to the data, and regularization is controlled by the kernel diagonal parameter $\epsilon$ .

A significant advantage is that the method does not require tuning of $W_n, b_n$ by backprop; either structural or randomized assignments of $W_n$ suffice, provided $N$ is large enough to attain the desired representation capacity.

5. Computational Complexity and Regularization

Training an OP NN is dominated by kernel matrix construction and linear system solve ( $O(M^3)$ for $M$ training points), matching standard GPR cost. Prediction at a new input $x$ is $O(NM)$ (summing $N$ dot products of length $M$ ), but this can be reduced by pruning neurons with low function variance over the data. Typical sparsification can discard 50-80% of neurons without degradation of test performance.

Kernel regularization parameter $\epsilon$ and kernel hyperparameters (amplitude $A$ , length scale $\ell$ , smoothness $\nu$ ) play central roles in controlling overfitting and accuracy, with typical $\epsilon \sim 10^{-6}$ to $10^{-4}$ . Automatic relevance determination, variance-based neuron pruning, and cross-validation of hyperparameters are straightforwardly incorporated in the training process.

6. Applications and Scope

OP NNs are well-suited to applications where high-accuracy regression is required, the number of features $D$ is moderate ( $\lesssim20$ ), and there is advantage to compact, interpretable, and robust models. They have immediate application in computational chemistry (PES), molecular property prediction, and scientific data modeling, but the approach can transfer to any field where additive models with learnable nonlinear responses per neuron are beneficial.

By construction, OP NNs retain the universal approximation property of single-hidden-layer neural networks, as any function can be expanded in terms of flexible univariate responses to linear projections. This method is particularly advantageous when iterative retraining must be avoided, global smoothness and robustness are prioritized, or where conventional NN overfitting is problematic (Manzhos et al., 2023).

7. Comparative Perspective

OP Neural Networks generalize the standard single-layer neural network by maximizing activation function heterogeneity under a data-driven Bayesian nonparametric prior. They provide a constructive, analytic solution for the hidden-layer response, bridge kernel methods and neural network methodology, and yield a highly expressive, stable, and interpretable model family. OP NN should not be confused with “Operational Neural Networks” or heterogeneously operator-based deep architectures, which address diversity via generalized nodal and pooling functions at the (sub)neuron level but do not optimize the nonlinear activation functions in the GPR sense (Kiranyaz et al., 2019, Malik et al., 2020).

The overall impact is to align neural network expressivity directly with the local curvature and structure of data, obtaining flexible, robust approximation with minimal manual tuning and without iterative, nonlinear optimization of weights (Manzhos et al., 2023).

PDF Markdown Chat (Pro)

References (3)

Neural network with optimal neuron activation functions based on additive Gaussian process regression (2023)

Operational Neural Networks (2019)

Operational vs Convolutional Neural Networks for Image Denoising (2020)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to OP Neural Network (NN).

OP Neural Network: Data-Driven Activations

1. Definition and Motivation

2. Additive Gaussian Process Regression Framework

3. Architecture Construction and Training Procedure

4. Empirical Performance

5. Computational Complexity and Regularization

6. Applications and Scope

7. Comparative Perspective

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

OP Neural Network: Data-Driven Activations

1. Definition and Motivation

2. Additive Gaussian Process Regression Framework

3. Architecture Construction and Training Procedure

4. Empirical Performance

5. Computational Complexity and Regularization

6. Applications and Scope

7. Comparative Perspective

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research