Quadratic Neural Networks

Updated 30 June 2025

Quadratic Neural Networks are models that compute node outputs with quadratic polynomials rather than affine-linear functions, naturally capturing second-order interactions.
They enable exact representation of geometric features like ellipses, improving approximation efficiency in inverse problems such as tomographic reconstruction.
Their structure supports more tractable convergence analysis and optimization compared to traditional deep networks, enhancing performance in structured signal recovery.

Quadratic neural networks form a class of models in which the decision functions—i.e., the pre-activation computations performed at each node—are quadratic polynomials of the input, rather than the traditional affine-linear functions. This modification enables each neuron to process second-order interactions natively, which markedly increases expressive power relative to standard shallow or deep neural networks. In the context of inverse problems, quadratic neural networks offer both improved approximation efficiency for structured signals (such as images composed of ellipses) and more tractable convergence analysis of numerical optimization schemes used for parameter identification.

1. Quadratic Neural Network Structure and Distinction

A quadratic neural network (QNN) generalizes the standard shallow neural architecture by using second-order polynomials as its node-wise "decision functions." For input $\vec{x} \in \mathbb{R}^n$ , a single-layer QNN can be written as: $\Psi[\vec{p}](\vec{x}) = \sum_{j=1}^N \alpha_j \, \sigma\left(\vec{w}_j^T \vec{x} + \vec{x}^T A_j \vec{x} + \theta_j\right),$ where:

$\alpha_j \in \mathbb{R}$ : output weights,
$\vec{w}_j \in \mathbb{R}^n$ : linear coefficients,
$A_j \in \mathbb{R}^{n \times n}$ : symmetric matrix encoding the quadratic term,
$\theta_j \in \mathbb{R}$ : bias,
$\sigma$ is an activation function, which may be, for example, ReLU, sigmoid, or tanh.

Variants considered include:

General QNNs (GQNNs): Fully trainable $A_j$ matrices.
Radial QNNs (RQNNs): $A_j = \xi_j I$ ; the quadratic part encodes radial symmetry (level sets are spheres or ellipses).
Constrained QNNs (CQNNs): Some or all quadratic coefficients are fixed or structured.
Sign-Based and Higher-Order QNNs: Nodes implementing e.g., sign-based or cubic polynomial interactions.

This differs from the Affine Linear Neural Network (ALNN) form: $\Psi^{\text{ALNN}}[\vec{p}](\vec{x}) = \sum_{j=1}^N \alpha_j \, \sigma\left(\vec{w}_j^T \vec{x} + \theta_j\right),$ which can represent only affine (hyperplane-based) separations unless made deep or wide.

QNNs provide significantly increased local model complexity per neuron, as one quadratic neuron can fit ellipses, circles, or other conic sections, while an ALNN cannot do so exactly with a finite number of neurons.

2. Application to Inverse Problems

Inverse problems—formulated as $A(f) = g_0$ , where $A$ is a possibly ill-posed linear or nonlinear operator—are addressed by parameterizing the unknown $f$ as a neural network: $f(\vec{x}) \approx \Psi[\vec{p}](\vec{x}),$ where $\Psi$ is the (quadratic) neural network and $\vec{p}$ is the vector of all trainable parameters, including quadratic coefficients.

The solution proceeds by minimizing a suitable loss function, typically expressing the discrepancy between $A(\Psi[\vec{p}])$ and the data $g_0$ . This minimization can be accomplished by:

Gauss-Newton iteration:

$\vec{p}^{k+1} = \vec{p}^k - \mathcal{N}'(\vec{p}^k)^\dagger \left[ \mathcal{N}(\vec{p}^k) - g \right],$

with $\mathcal{N}(\vec{p}) = F \circ \Psi(\vec{p})$ , $F$ potentially representing the forward operator in the inverse problem.

The inclusion of quadratic terms in the ansatz enhances the network’s ability to approximate localized structures and geometric features commonly encountered in inverse problems, such as the boundaries of phantoms in tomography.

3. Performance and Expressivity Comparison

Empirical and theoretical results demonstrate that QNNs can vastly outperform standard architectures in representing certain function classes. A notable example is the Shepp-Logan phantom—a standard benchmark in tomographic reconstruction—which is the union of several ellipses:

QNNs: The Shepp-Logan phantom can be represented exactly by a GQNN with as few as 10 neurons, since each ellipse is described by a single quadratic form.
ALNNs or DNNs: These cannot represent the Shepp-Logan phantom exactly with any finite number of neurons, even if depth is increased, due to the inability of affine-linear units to model sharp, localized, nonlinear boundaries.

This result underscores the benefit of incorporating higher-order interactions already at the single-layer or single-neuron level when the target signals have known low-order structure.

4. Convergence Analysis and Optimization

The use of quadratic (or higher-order) ansatz functions in shallow networks not only enhances representation but also leads to more transparent convergence analysis for local optimization algorithms:

In RQNNs: The derivative of the outputs with respect to parameters has explicit, tractable forms:

$\frac{\partial \Psi}{\partial \xi_s} = \alpha_s \sigma'(\nu_s) \|\vec{x}\|^2, \quad \frac{\partial \Psi}{\partial w_s^{(t)}} = \alpha_s \sigma'(\nu_s) x_t,$

where $\nu_s(\vec{x}) = \vec{w}_s^T \vec{x} + \xi_s \|\vec{x}\|^2 + \theta_s$ .

For the Gauss-Newton solution: Local quadratic convergence is guaranteed under mild and checkable conditions, such as:
- The derivatives of the ansatz functions do not degenerate (i.e., the parameterization forms a local manifold),
- For RQNN: The second-order moment functions must be linearly independent.

For deep ALNNs (standard deep networks), derivatives involve deeply nested chain rules, activation function compositions, and combinatorial parameter interactions, making verification of the necessary convergence conditions significantly more complex.

5. Mathematical Formulations and Approximation Results

The paper provides formal statements and rates for the expressive capability of QNNs:

General QNN (GQNN): Each neuron computes

$\sigma(\vec{w}_j^T \vec{x} + \vec{x}^T A_j \vec{x} + \theta_j).$

The parameter vector comprises all $\alpha_j$ , $\vec{w}_j$ , $A_j$ , and $\theta_j$ across $N$ units.

RQNN: Quadratic matrix is a scaled identity, so neuron computes

$\sigma(\vec{w}_j^T \vec{x} + \xi_j \|\vec{x}\|^2 + \theta_j).$

Approximation rates: For $f \in L^1(\mathbb{R}^n)$ and $N$ neurons,

$\| f - \Psi[\vec{p}] \|_{L^2} \leq C \|f\|_{L^1} (N+1)^{-1/2},$

matching optimal rates established for networks with unstructured basis functions.

Universal Approximation: Provided the decision functions are injective and the activation is continuous and discriminatory, QNNs can approximate any continuous function on compact domains.

6. Implications and Future Perspectives

Quadratic neural networks provide significant advantages for inverse problems characterized by solution structures with analytic, geometric, or localized features that can be succinctly encoded with quadratic forms. For these classes of signals, QNNs can achieve sparse and accurate representations and facilitate faster, more robust convergence of iterative training methods with more transparent parameter identifiability and optimization landscapes.

Future directions highlighted include:

Extension of convergence and approximation analyses from radial/quadratic to general quadratic/cubic architectures.
Systematic integration of QNNs into modern large-scale inverse problems, leveraging their analytical tractability and expressivity for practical problems in medical imaging, geophysics, and beyond.
Further paper of the optimization landscape and global minimization in high-dimensional or overparameterized QNNs.
Comparative exploration of QNN-based approaches versus deep architectures in real-world ill-posed inverse settings.

Summary Table: Standard vs. Quadratic Neural Networks for Inverse Problems

Aspect	Standard NN (ALNN/DNN)	Quadratic Neural Network (QNN)
Neuron function	Affine-linear ( $\vec{w}^T \vec{x} + \theta$ )	Quadratic ( $\vec{w}^T\vec{x} + \vec{x}^T A \vec{x} + \theta$ )
Function classes natively captured	Global hyperplanes; localized features require depth/width	Localized/nonlinear/geometric (e.g., ellipses) with few units
Convergence analysis	Complex for deep networks (nested compositions)	Transparent, analyzable in shallow QNNs
Exact representation of phantoms (e.g., Shepp-Logan)	Not possible with finite units	Possible with few neurons (e.g., 10 for Shepp-Logan)
Suitability for inverse problems	Less efficient for geometric/analytic structure	Highly efficient for geometric/analytic structure

Quadratic neural networks thus enhance both the theoretical and practical toolkit for inverse problem solution, offering a powerful alternative or complement to standard neural architectures where the problem domain admits or favors explicit low-degree polynomial representation.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Quadratic Networks.