Closed-Form Quadratic Projection Layer

Updated 13 May 2026

The paper introduces a closed-form projection operator that reformulates quadratic constraints into a secular equation solved via Lagrange multipliers.
Key methodologies include explicit algebraic formulas, efficient root-finding techniques, and autodifferentiation to derive accurate gradients.
Practical applications span beamforming, MLP distillation, and convex relaxations, achieving near-optimal performance with significantly reduced runtime.

A Closed-Form Quadratic Projection Layer is a neural network layer, module, or algorithmic operator that computes the orthogonal projection of an arbitrary input point onto a set or constraint manifold characterized by quadratic (or quadratic-plus-linear) equations, inequalities, or convex sets, where the projector admits either an explicit algebraic formula or a numerically efficient algorithm with analytic expressions for output and gradient. These layers are deployed in contexts where enforcing nontrivial quadratic constraints or mapping MLPs to interpretable polynomials is essential, such as signal processing, control, convex relaxation, or deep learning with structured priors.

1. Mathematical Formulation of Quadratic Projection

The classic instantiation of a quadratic projection layer involves computing

$\min_x \; \frac{1}{2}\|x - x^0\|_2^2 \quad \text{s.t.}\quad \frac{1}{2}x^T A x + b^T x + c = 0$

or a related set of homogeneous/inhomogeneous constraints (e.g., onto quadratic hypersurfaces, parabolic manifolds, or the solution set of a convex QCQP) (Hoorebeeck et al., 2022, Aragón-Artacho et al., 27 Dec 2025, Beylunioglu et al., 27 Oct 2025, Wang et al., 2024).

The generic solution structure exploits the Lagrange multiplier approach: for a quadratic equality constraint, the first-order condition yields

$x(\lambda) = (I + \lambda A)^{-1}(x^0 - \lambda b)$

with the scalar multiplier $\lambda^*$ determined by enforcing the constraint, leading to a secular equation

$\phi(\lambda) = \frac{1}{2} x(\lambda)^T A x(\lambda) + b^T x(\lambda) + c = 0$

that is uniquely solvable in a suitable interval. When the constraint is a convex quadratic inequality, the solution reduces to either a closed-form (for specific cases) or a monotone root-finding procedure with algebraic derivatives, lending itself to differentiable programming frameworks (Hoorebeeck et al., 2022).

2. Explicit Formulas for Key Classes of Sets

Several canonical quadratic sets admit tractable projectors:

Parabola or Quadratic Curve: Projecting onto $y = a x^2 + b x + c$ reduces to minimizing a quartic in $x$ , or finding the critical points of the cubic $Q'(x)$ as in (Aragón-Artacho et al., 27 Dec 2025).
Quadratic Hypersurfaces (Central Quadric): The reduction of the KKT system to a scalar nonlinear equation in $\lambda$ yields a root in a computable interval; differentiability and closed-form Jacobians are provided (Hoorebeeck et al., 2022).
Capped Rotated Second-Order Cone: Projection decomposes into seven cases, including solutions of cubic and quartic equations for set intersection and cone-boundary cases, lending itself to efficient bisection-based routines (Goldberg et al., 2023).
Polyhedral Sets (Intersection of Halfspaces): For convex polytopes, the projection reduces to a QP whose solution can be written in closed-form for one or two constraints, or via an active-set enumeration procedure for general $m$ (Rutkowski, 2016).

Set Type	Key Equation(s)	Solver Type
Parabola ( $y=a x^2+bx+c$ )	Roots of cubic $x(\lambda) = (I + \lambda A)^{-1}(x^0 - \lambda b)$ 0	Cardano/formulas
Quadric ( $x(\lambda) = (I + \lambda A)^{-1}(x^0 - \lambda b)$ 1)	Scalar secular eqn $x(\lambda) = (I + \lambda A)^{-1}(x^0 - \lambda b)$ 2	Root via Newton/bisection
Capped cone	Cubic/quartic polynomials	Closed-form + bisection
Polyhedron ( $x(\lambda) = (I + \lambda A)^{-1}(x^0 - \lambda b)$ 3)	Active set KKT, small matrix inversion	Linear algebra

3. Differentiable Layer Structure and Backpropagation

Quadratic projection layers are constructed to be end-to-end differentiable. For a constraint set with fixed active set $x(\lambda) = (I + \lambda A)^{-1}(x^0 - \lambda b)$ 4, the mapping $x(\lambda) = (I + \lambda A)^{-1}(x^0 - \lambda b)$ 5 is affine; the Jacobian is

$x(\lambda) = (I + \lambda A)^{-1}(x^0 - \lambda b)$ 6

for polyhedral sets, and similarly via first and second derivatives for general quadratic hypersurface constraints (Hoorebeeck et al., 2022, Rutkowski, 2016). When the active set changes with the input, the mapping is piecewise affine or smooth and subdifferentiable, permitting valid Clarke subgradients during backpropagation.

For layered network integration:

Save sufficient algebraic (e.g., active set, matrix factorizations, $x(\lambda) = (I + \lambda A)^{-1}(x^0 - \lambda b)$ 7) for reuse during gradient computation.
Employ autodifferentiation through numerically stable primitive operations (matrix solve, polynomial root, vector arithmetic).
Handle nondifferentiable regions by selecting subgradients associated with the active branch (Rutkowski, 2016, Goldberg et al., 2023).

4. Applications and Empirical Results

Closed-form quadratic projection layers are utilized in:

Constrained Beamforming and QCQP: Efficient projections ensure semi-definite QCQP constraints, notably in multi-user MISO scenarios (Wang et al., 2024). For example, beamformers are scaled by $x(\lambda) = (I + \lambda A)^{-1}(x^0 - \lambda b)$ 8 to guarantee $x(\lambda) = (I + \lambda A)^{-1}(x^0 - \lambda b)$ 9 for all $\lambda^*$ 0, achieving performance close to SDR at orders of magnitude reduced runtime.
Neural Network Surrogates for QP: Construction of analytic ReLU-based networks that parameterize the exact solution to general QPs with linear constraints, partitioning solution space into critical regions and expressing the optimizer as a piecewise affine, and thus ReLU-implementable, function (Beylunioglu et al., 27 Oct 2025).
MLP Distillation: Quadratic projections are applied to approximate multilayer perceptrons as degree-2 polynomials in closed-form, retaining $\lambda^*$ 1 of explained variance in practical networks (e.g., MNIST scale), allowing for interpretability via the weight structure (Belrose et al., 3 Feb 2025).
Projection onto Quadratic Surfaces: Exact Newton/bisection routines outperform general-purpose SDP solvers in both accuracy and runtime in power system and ML tasks (Hoorebeeck et al., 2022).

5. Computational Complexity and Practical Implementation

The computational burden depends primarily on the structure of the constraint:

Projection onto polyhedra: Worst-case $\lambda^*$ 2 for full enumeration, mitigated in practice by using efficient QP solvers (Rutkowski, 2016).
Quadratic surfaces: Per-step cost $\lambda^*$ 3 for matrix solves, but local quadratic convergence of Newton's method ensures a small fixed iteration count (Hoorebeeck et al., 2022).
Parabola and capped cones: $\lambda^*$ 4 for fixed-dimension cubic/quartic solves, with vectorized stable implementation recommended (Aragón-Artacho et al., 27 Dec 2025, Goldberg et al., 2023).
QCQP in communications: $\lambda^*$ 5 for batch beamforming, linear in antennas/users (Wang et al., 2024).
Piecewise-affine QP solution networks: $\lambda^*$ 6 multiplications once the solution regions are precomputed ( $\lambda^*$ 7 regions, $\lambda^*$ 8 variables, $\lambda^*$ 9 constraints) (Beylunioglu et al., 27 Oct 2025).

6. Variants, Extensions, and Theoretical Guarantees

Generalization to Banach spaces: Projection principles and closed-form expressions extend beyond Hilbert spaces under appropriate modifications (Rutkowski, 2016).
Intersections of quadratic and polyhedral sets: Alternating projections and operator splitting (e.g., Douglas-Rachford) leverage explicit layer structure for rapid convergence (Hoorebeeck et al., 2022).
Stability and uniqueness: Secular equations for quadratic constraints are strictly monotone on explicitly characterized intervals, guaranteeing unique roots and thus unique projectors (Hoorebeeck et al., 2022, Aragón-Artacho et al., 27 Dec 2025).
Theoretical optimality: Projected networks preserve feasibility and often achieve objective values proximal to analytic (SDR-based) lower bounds, with convergence guarantees derived from reduced parameter search spaces (Wang et al., 2024).
Differentiable at measure-zero boundaries: Layers are almost everywhere differentiable; at boundaries, subdifferential calculus or straight-through estimators apply.

7. Representative Algorithms and Implementation Details

A closed-form quadratic projection layer can be represented by concise pseudocode, tailored for various structure types. For quadratic equality constraints: $\phi(\lambda) = \frac{1}{2} x(\lambda)^T A x(\lambda) + b^T x(\lambda) + c = 0$ 3 For parabolic curves (Aragón-Artacho et al., 27 Dec 2025), projection reduces to finding the appropriate root of a cubic (Cardano's formula), with robust branching logic and vectorization. For capped rotated second-order cones (Goldberg et al., 2023), projection is assembled from explicit seven-case formulas, using cubic/quartic root solves, and batch processing for high throughput.

Empirical accuracy: on MNIST-class problems, quadratic approximants of MLPs typically achieve $\phi(\lambda) = \frac{1}{2} x(\lambda)^T A x(\lambda) + b^T x(\lambda) + c = 0$ 0; in beamforming, closed-form QCQP projection yields transmit power within 5–10% of SDR lower bounds at $\phi(\lambda) = \frac{1}{2} x(\lambda)^T A x(\lambda) + b^T x(\lambda) + c = 0$ 1 to $\phi(\lambda) = \frac{1}{2} x(\lambda)^T A x(\lambda) + b^T x(\lambda) + c = 0$ 2 lower compute cost (Belrose et al., 3 Feb 2025, Wang et al., 2024).