Quadratic Surface Optimization Technique

Updated 14 January 2026

Quadratic surface optimization technique is a method that approximates unconstrained quadratic minimization by sampling a small set of indices to construct a reduced problem.
The approach uses constant-time randomized sampling to compute a rescaled estimate with an additive error of O(ε n²), independent of the full problem dimension.
Empirical results show competitive accuracy and runtime performance for applications like kernel divergence estimation, ridge regression, and large-scale machine learning.

A quadratic surface optimization technique refers to a class of algorithms that seek to compute (approximately or exactly) the minimum value, minimizer, or other geometric property of an objective function of the form

$p(\mathbf{v}) = \mathbf{v}^T A \mathbf{v} + n \cdot \mathbf{v}^T \mathrm{diag}(\mathbf{d}) \mathbf{v} + n \cdot \mathbf{b}^T \mathbf{v},$

with $A \in \mathbb{R}^{n \times n}$ , $\mathbf{d}, \mathbf{b} \in \mathbb{R}^n$ , and $\mathbf{v} \in \mathbb{R}^n$ . A canonical problem is to minimize $p(\mathbf{v})$ over $\mathbb{R}^n$ given only query access to $A, \mathbf{d}, \mathbf{b}$ and possibly with substantial problem dimension $n$ . Recent developments have produced both traditional and randomized techniques with complexity independent of $n$ , thus opening new computational regimes for "quadratic surface optimization" (Hayashi et al., 2016).

1. Problem Formulation and Discretization

The central problem is the unconstrained quadratic minimization

$z^* = \min_{\mathbf{v} \in \mathbb{R}^n} \mathbf{v}^T A \mathbf{v} + n \cdot \mathbf{v}^T \mathrm{diag}(\mathbf{d}) \mathbf{v} + n \cdot \mathbf{b}^T \mathbf{v},$

where $A$ is an arbitrary real matrix, $\mathbf{d}$ and $\mathbf{b}$ are vectors, and the term $n$ is the problem dimension. The matrix $A$ is typically either dense or expensive to access in its entirety, motivating algorithms that do not scale with $n$ (Hayashi et al., 2016).

The assumption $A + n \cdot \mathrm{diag}(\mathbf{d}) \succ 0$ ensures boundedness and uniqueness of the minimum. In many applications, only the optimal value $z^*$ is required, not the optimizer $\mathbf{v}^*$ .

2. Constant-Time Sampling-Based Algorithm

The key advance is a randomized, sampling-based algorithm whose computational and query complexity is $O(1)$ in $n$ . The algorithm proceeds as follows (Hayashi et al., 2016):

Sampling: For fixed parameters $\epsilon, \delta$ governing additive error and failure probability, compute $k = 2^{\Theta(1/\epsilon^2)} + \Theta(\log(1/\delta) \log\log(1/\delta))$ .
Randomly sample $k$ indices (with replacement) from $\{1,\ldots,n\}$ to obtain a subset $S$ .
Construct the $k \times k$ principal submatrix $A_S$ and subvectors $\mathbf{d}_S, \mathbf{b}_S$ .
Solve the reduced $k$ -dimensional quadratic minimization

$\tilde{z}^* = \min_{\mathbf{w} \in \mathbb{R}^k} \mathbf{w}^T A_S \mathbf{w} + k \cdot \mathbf{w}^T \mathrm{diag}(\mathbf{d}_S) \mathbf{w} + k \cdot \mathbf{b}_S^T \mathbf{w}.$

This step is $O(k^3)$ via direct factorization.

Output the rescaled estimate $z = (n^2 / k^2) \cdot \tilde{z}^*$ .

All information obtained from $A, \mathbf{d}, \mathbf{b}$ is acquired only through this small sample, so neither time nor space complexity grows with $n$ .

3. Theoretical Guarantees and Error Analysis

The main guarantee is that, with probability at least $1-\delta$ ,

$|z - z^*| \leq \epsilon L K^2 n^2,$

where $L = \max\{ |A_{ij}|, |d_i|, |b_i| \}$ (entrywise bound), $K = \max\{ \|\mathbf{v}^*\|_\infty, \|\tilde{\mathbf{w}}^*\|_\infty \}$ (sup-norm bound on optimizer). In typical regimes where $L,K = O(1)$ , the additive error is $O(\epsilon n^2)$ (Hayashi et al., 2016).

The proof is based on:

A weak regularity lemma (matrix block-constant approximation in cut-norm),
Concentration of small random principal submatrices,
Cut-norm control of quadratic forms.

These reduce the high-dimensional quadratic to an accurate "block constant" approximation via sampling, justifying the scaling by $(n/k)^2$ .

4. Practical Implementation Considerations

The algorithm requires:

Fast uniform random sampling and array storage for indices,
Efficient assembly of the sampled submatrix and subvectors,
Cholesky or pseudoinverse solution of the $k \times k$ system,
Choice of $k$ via prior knowledge or loose outer bounds on $L, K$ ,
Numerical regularization (e.g., adding a small ridge $\eta I$ to $A_S$ when necessary).

If only the optimal value is sought and a modest additive error $O(\epsilon n^2)$ is acceptable, this approach is appropriate even for extremely large $n$ .

5. Empirical Validation

Experiments confirm both the $O(\epsilon n^2)$ accuracy and $O(1)$ runtime:

Synthetic data (random $A, \mathbf{d}, \mathbf{b}$ ): relative errors remained $O(10^{-3})$ or better for $n$ from $500$ up to $5000$, $k$ from $20$ to $160$.
Kernel-divergence applications (e.g., Pearson divergence estimation): error was $<10^{-3}$ for $k \geq 40$ , with flat runtime (on the order of $0.003$ s for $k=20$ ). This compared favorably to low-rank Nyström methods, both in error and runtime scaling (Hayashi et al., 2016).

6. Application Scope and Limitations

This quadratic surface optimization technique is ideal when:

Only the minimum value (not the argmin) of a large dense quadratic form is needed,
The Hessian or data matrix $A$ is expensive to access or manipulate fully,
Dimensionality $n$ is massive and precludes even $O(n)$ sweeps,
A small, controlled $O(\epsilon n^2)$ additive error is acceptable.

Representative domains include:

Kernel-based statistical divergence estimation (Pearson, KL, $\chi^2$ ),
Value evaluation in unconstrained ridge regression,
Machine learning settings where quadratic forms arise over populations/orders of $10^5$ or higher (Hayashi et al., 2016).

Notably, while the method provides an estimator for $z^*$ , it does not recover the optimizer $\mathbf{v}^*$ itself. For fine-resolved solutions or non-uniform error control, traditional methods may be preferred.

7. Relation to Broader Quadratic Optimization Methods

This constant-time sampling approach complements a spectrum of quadratic optimization strategies:

SOCP and SDP relaxations in uniform quadratic and QCQP frameworks (Wang et al., 2015),
Classical iterative, face-projection, and active-set methods for quadratic programs with linear constraints (Stromberg, 2023),
Variable-projection and partial minimization facilitating well-conditioned penalized formulations (Aravkin et al., 2016),
MIP relaxations for nonconvex quadratic problems via piecewise-linear approximation and mixed-integer encoding (Beach et al., 2020).

By leveraging random sampling and exploiting structural concentration phenomena, the quadratic surface optimization technique provides a unique route to fast approximate minimization in high dimensions, with rigorous, dimension-independent error and complexity guarantees.