Papers
Topics
Authors
Recent
2000 character limit reached

OP Neural Network: Data-Driven Activations

Updated 29 December 2025
  • OP Neural Networks are feed-forward models that learn individualized activation functions using an additive Gaussian process regression framework.
  • They replace conventional nonlinear optimization with a closed-form linear solve, enhancing computational efficiency and reducing overfitting.
  • The approach combines kernel methods and neural network expressivity to deliver robust performance in tasks like molecular potential energy surface fitting.

An "OP Neural Network" (OP NN) refers to a feed-forward neural network architecture in which the neuron activation functions are not fixed and shared but are instead learned individually for each neuron via an additive Gaussian process regression (GPR) framework. This approach, introduced in "Neural network with optimal neuron activation functions based on additive Gaussian process regression" (Manzhos et al., 2023), enables the automatic construction of neuron-specific, data-driven activations for single-hidden-layer neural networks, with training reduced to a linear solve rather than nonlinear optimization. The result is a method that robustly combines the representational power of neural networks with the flexibility and regularization of kernel methods.

1. Definition and Motivation

Let xRDx \in \mathbb{R}^D be an input and consider a fully connected single-hidden-layer neural network. Conventionally, the output is

F(x)=n=1NCno(Wnx+bn)+boutF(x) = \sum_{n=1}^N C_n\, o(W_n x + b_n) + b_\text{out}

where all neurons share a fixed activation function o()o(\cdot) (e.g., sigmoid, tanh), and the weights {Wn,bn,Cn}\{W_n, b_n, C_n\} are typically learned by nonlinear optimization. OP Neural Networks drop the assumption of fixed o()o(\cdot) and instead construct each neuron’s activation ϕn()ϕ_n(\cdot) optimally for the given data via GPR. The resulting expansion is

F(x)=n=1Nϕn(yn),yn=Wnx+bnF(x) = \sum_{n=1}^N ϕ_n(y_n)\,, \qquad y_n = W_n x + b_n

where each ϕnϕ_n is an individually learned, data-driven, univariate function.

The motivation is that more flexible, neuron-specific activations can yield higher expressivity per neuron, reduce overfitting, and allow for compact, shallow models with strong approximation power, while avoiding the pitfalls (local minima, over-parametrization) of large, standard neural networks (Manzhos et al., 2023).

2. Additive Gaussian Process Regression Framework

Each activation ϕn()ϕ_n(\cdot) is modeled as a GPR function of its scalar argument yny_n,

ϕn()GP(0,kn(,))ϕ_n(\cdot) \sim \text{GP}(0, k_n(\cdot, \cdot))

with a chosen kernel knk_n (typically squared exponential or Matérn). The network output thus defines an additive GPR model on the function space:

F(x)=n=1Nϕn(yn)F(x) = \sum_{n=1}^N ϕ_n(y_n)

For a set of MM training examples (x(m),f(m))(x^{(m)}, f^{(m)}), m=1,,Mm=1,\dots,M, one assembles the hidden variables yn(m)=Wnx(m)+bny^{(m)}_n=W_n x^{(m)} + b_n for all nn and mm, and constructs the kernel Gram matrix

K=n=1NK(n)+ϵIK = \sum_{n=1}^{N} K^{(n)} + \epsilon I

with [K(n)]ij=kn(yn(i),yn(j))[K^{(n)}]_{ij} = k_n(y^{(i)}_n, y^{(j)}_n). The vector of predictive coefficients α\alpha is computed by

α=(K+ϵI)1f\alpha = (K + \epsilon I)^{-1} f

where f=[f(1),...,f(M)]Tf = [f^{(1)}, ..., f^{(M)}]^T is the vector of observed outputs. The learned activation function for neuron nn is

ϕn(u)=kn(u,Yn)αϕ_n(u) = k_n(u, Y_n) \alpha

where Yn=[yn(1),...,yn(M)]Y_n = [y^{(1)}_n, ..., y^{(M)}_n] and kn(u,Yn)k_n(u, Y_n) is the kernel vector between uu and all training projections YnY_n.

3. Architecture Construction and Training Procedure

The OP NN approach circumvents nonlinear parameter optimization. Instead, it proceeds as follows:

  1. Preprocess input data, typically standardizing to zero mean and unit variance.
  2. Choose the number of neurons NN and specify the linear projections yn=Wnx+bny_n = W_n x + b_n. WnW_n and bnb_n are not learned by backpropagation but rather set by heuristic or structure—either all D(D1)/2D(D-1)/2 pairwise feature averages for small DD, or random projections (e.g., Sobol sequence) with bn=0b_n=0 for arbitrary NN.
  3. Construct kernel Gram matrices K(n)K^{(n)} for all neurons across the dataset.
  4. Form the complete Gram matrix KK and solve for α\alpha by linear algebra.
  5. Define each ϕnϕ_n via the kernel interpolation.
  6. For prediction, evaluate F(x)=nϕn(Wnx+bn)F_*(x) = \sum_n ϕ_n(W_n x + b_n) at any new xx.

No iterative gradient descent is required; all optimization is closed-form for a fixed choice of projections and kernel hyperparameters.

4. Empirical Performance

The OP NN method demonstrates substantial gains in scientific regression tasks, notably molecular potential energy surface (PES) fitting for small molecules such as H₂O (D=3D=3) and H₂CO (D=6D=6):

  • For H₂O with N=51N=51 and M=1000M=1000 training points, the OP NN attains test errors as low as 0.46cm10.46\,\mathrm{cm}^{-1}, outperforming both conventional sigmoid-activated NNs and additive GPR baselines at comparable NN (Manzhos et al., 2023).
  • For H₂CO, OP NN with N>500N>500 matches the accuracy of full GPR (20cm1\sim20\,\mathrm{cm}^{-1}), but with direct neural evaluation form.
  • Overfitting is sharply reduced compared to standard NNs, as the flexible activations adapt directly to the data, and regularization is controlled by the kernel diagonal parameter ϵ\epsilon.

A significant advantage is that the method does not require tuning of Wn,bnW_n, b_n by backprop; either structural or randomized assignments of WnW_n suffice, provided NN is large enough to attain the desired representation capacity.

5. Computational Complexity and Regularization

Training an OP NN is dominated by kernel matrix construction and linear system solve (O(M3)O(M^3) for MM training points), matching standard GPR cost. Prediction at a new input xx is O(NM)O(NM) (summing NN dot products of length MM), but this can be reduced by pruning neurons with low function variance over the data. Typical sparsification can discard 50-80% of neurons without degradation of test performance.

Kernel regularization parameter ϵ\epsilon and kernel hyperparameters (amplitude AA, length scale \ell, smoothness ν\nu) play central roles in controlling overfitting and accuracy, with typical ϵ106\epsilon \sim 10^{-6} to 10410^{-4}. Automatic relevance determination, variance-based neuron pruning, and cross-validation of hyperparameters are straightforwardly incorporated in the training process.

6. Applications and Scope

OP NNs are well-suited to applications where high-accuracy regression is required, the number of features DD is moderate (20\lesssim20), and there is advantage to compact, interpretable, and robust models. They have immediate application in computational chemistry (PES), molecular property prediction, and scientific data modeling, but the approach can transfer to any field where additive models with learnable nonlinear responses per neuron are beneficial.

By construction, OP NNs retain the universal approximation property of single-hidden-layer neural networks, as any function can be expanded in terms of flexible univariate responses to linear projections. This method is particularly advantageous when iterative retraining must be avoided, global smoothness and robustness are prioritized, or where conventional NN overfitting is problematic (Manzhos et al., 2023).

7. Comparative Perspective

OP Neural Networks generalize the standard single-layer neural network by maximizing activation function heterogeneity under a data-driven Bayesian nonparametric prior. They provide a constructive, analytic solution for the hidden-layer response, bridge kernel methods and neural network methodology, and yield a highly expressive, stable, and interpretable model family. OP NN should not be confused with “Operational Neural Networks” or heterogeneously operator-based deep architectures, which address diversity via generalized nodal and pooling functions at the (sub)neuron level but do not optimize the nonlinear activation functions in the GPR sense (Kiranyaz et al., 2019, Malik et al., 2020).

The overall impact is to align neural network expressivity directly with the local curvature and structure of data, obtaining flexible, robust approximation with minimal manual tuning and without iterative, nonlinear optimization of weights (Manzhos et al., 2023).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to OP Neural Network (NN).

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube