Unified QNSP Algorithm

Updated 21 February 2026

The Unified QNSP algorithm is a composite optimization method that integrates stochastic gradients, variance reduction, and memory-efficient quasi-Newton updates for rapid convergence.
It leverages mini-batch evaluations and proximal mapping steps to efficiently solve problems with smooth and non-smooth components, making it effective for image reconstruction and latent variable models.
The approach achieves computational efficiency with per-iteration complexity of O(md) and delivers linear or sublinear convergence rates under convex and nonconvex settings.

The Unified Quasi-Newton Stochastic Proximal (QNSP) algorithm denotes a class of optimization methods that integrate stochastic first-order information, variance reduction, and memory-efficient quasi-Newton curvature approximations within a proximal framework. QNSP addresses large-scale composite minimization problems of the structure $\min_x F(x) + r(x)$ , with $F$ typically representing a large-sum smooth (potentially nonconvex) data-fidelity term and $r$ imposing non-smooth penalties or constraints. QNSP frameworks have been central to recent advances in computational imaging, nonlinear inverse problems, latent variable model estimation, and large-scale regularized learning, combining rapid convergence with per-iteration efficiency (Hong et al., 2023, Yang et al., 2019, Song et al., 2024, Luo et al., 2016, Zhang et al., 2020).

1. Problem Structure and Optimization Setting

QNSP is designed for composite optimization problems of the form

$\min_{x\in\mathcal{C}}\; F(x) + r(x),$

where $F(x) = \frac{1}{L}\sum_{i=1}^L f_i(x)$ (smooth but may be nonconvex), and $r(x)$ is a convex (often non-smooth) regularizer—examples include total variation, $\ell_1$ -norm, or indicator of a convex set $\mathcal{C}$ (e.g., nonnegativity) (Hong et al., 2023, Song et al., 2024, Luo et al., 2016). In latent variable model estimation, the formulation is further specialized: $\min_{\beta\in\mathcal{B}}\, h(\beta) + g(\beta),$ where $h$ convolves negative log-likelihood with a smooth penalty, and $g$ encodes additional non-smooth penalties and hard constraints (Zhang et al., 2020). The smooth component typically satisfies Lipschitz or strong convexity properties on gradients; the non-smooth part is assumed proximable.

2. Core Algorithmic Components

QNSP methods systematically combine the following elements:

Mini-batch Stochastic Gradient Evaluation: Partitioning of data indices to construct unbiased estimators of $\nabla F(x)$ at each iteration, rendering the per-iteration cost independent of dataset size (Hong et al., 2023, Yang et al., 2019).
Quasi-Newton Hessian Updates: Construction of low-rank, diagonal, or limited-memory BFGS/SR1-type positive-definite surrogates for the (local, block-wise) Hessian of $F$ . Updates rely on successive difference pairs $(s_k, y_k)$ , with curvature regularization and safeguard tests to ensure numerical stability and bounded condition numbers (Hong et al., 2023, Song et al., 2024, Luo et al., 2016, Zhang et al., 2020).
Variance Reduction: Integration with variance-reduced stochastic gradient estimators (e.g., SVRG, L-SVRG, SAGA, SEGA) to achieve geometric/linear convergence in strongly convex settings (Song et al., 2024, Luo et al., 2016).
Proximal Mapping Step: The principal update is cast as a "proximal quasi-Newton" mapping

$x_{k+1} = \operatorname*{prox}_{\eta_k, r}^{H_k}(x_k - \eta_k H_k^{-1} v_k),$

where $H_k$ is the quasi-Newton matrix and $v_k$ is the (possibly variance-reduced) stochastic gradient. When $r$ is composite (e.g., TV + indicator or $\ell_1$ ), efficient dual or coordinate algorithms are invoked (Hong et al., 2023, Song et al., 2024, Luo et al., 2016, Zhang et al., 2020).

SSN Solvers and Dualization: For complex weighted-proximal subproblems, semismooth Newton (SSN) or dual smoothing approaches achieve $O(d)$ complexity per step via compact matrix representations (Song et al., 2024).

3. Algorithmic Workflows and Representative Pseudocode

A typical QNSP framework proceeds as follows:

Gradient and Mini-batch Handling: At iteration $k$ , sample or cycle through a batch $I_k$ to compute $g_k$ .
Curvature Pair Accumulation: After a block, update $s_k, y_k$ for Hessian estimation, ensuring secant/curvature conditions (e.g., Powell or SR1 tests).
Quasi-Newton Matrix Update: Build $B_k$ (e.g., via L-BFGS, SR1, or diagonal tracking), store a tractable representation, and compute $H_k = B_k^{-1}$ .
Weighted-proximal Subproblem Solution: Solve

$x_{k+1} = \arg\min_x \left\{ r(x) + \frac{1}{2}\|x - h_k\|^2_{H_k^{-1}} \right\}$

where $h_k$ is derived from aggregated gradients and Hessians. Efficient dual approaches (TV regularization), SSN for $\ell_1$ or constraint-indicator, and coordinate-splitting are employed (Hong et al., 2023, Song et al., 2024, Zhang et al., 2020).

Parameter and Step-size Policies: Step-size $\eta_k$ , batch sizes, and memory/compression parameters are engineered for linear-rate convergence or stationarity in nonconvex cases.

A high-level implementation is illustrated in the algorithms of (Hong et al., 2023, Yang et al., 2019, Song et al., 2024, Luo et al., 2016).

4. Theoretical Results: Convergence and Complexity

Under convexity and Lipschitz smoothness, QNSP admits $O(1/k)$ convergence in function value; under strong convexity and appropriate variance reduction, global geometric (linear) convergence is established (Song et al., 2024, Luo et al., 2016). Notably,

For expectation of proximal residuals, $E[\|F^I(x_k)\|^2] \to 0$ as $k\to\infty$ under mild conditions (Yang et al., 2019).
With variance-reduced gradients and bounded-memory quasi-Newton matrices, one obtains $O(\log(1/\epsilon))$ complexity in stochastic gradient calls for $\epsilon$ accuracy (Luo et al., 2016).
In nonconvex regimes, accumulation points of the sequence are stationary, and rates $O(1/\sqrt{k})$ apply for the norm of the gradient (Hong et al., 2023).

Storage cost is typically $O(md)$ (with $m$ the memory length, $d$ problem dimension), and per-iteration time is $O(md)$ utilizing compact Woodbury or diagonal forms. In empirical Bayes settings with latent variables, almost sure convergence to stationary points is proved, with averaged methods achieving $O(n^{-1/2})$ error rates up to logarithmic factors (Zhang et al., 2020).

5. Practical Implementations and Empirical Performance

QNSP variants have been instantiated for a range of large-scale inverse problems and statistical inference tasks, including:

Nonlinear Image Reconstruction: QNSP yields superior iteration and wall-clock complexity compared to ASPM on 3D inverse-scattering with total-variation regularization, requiring 2–3 $\times$ fewer iterations and 2–3 $\times$ less total time to reach a prescribed SNR (Hong et al., 2023).
Latent Variable Model Estimation: QNSP robustly handles latent variable models with complex constraints and non-smooth penalties, outperforming stochastic-EM and first-order alternatives in mean-squared error, with favorable scaling in $N$ (number of samples) and model dimension (Zhang et al., 2020).
Large-scale Regularized Logistic Regression: QNSP methods integrating L-SVRG, SAGA, or SEGA with L-BFGS curvature rapidly outperform both classic proximal quasi-Newton and first-order VR approaches, delivering improved epoch complexity and subproblem solve times (e.g., SSN inner solvers outperform ISTA/FISTA by an order of magnitude) (Song et al., 2024, Luo et al., 2016).

6. Extensions, Modeling Variants, and Scope

QNSP provides a unified umbrella for single-loop and inner-outer loop stochastic proximal quasi-Newton methods, accommodating block-diagonal, diagonal, full memory, or coordinatewise Hessian approximations; variance reduction via multiple recent schemes; hardcore constraints and nonsmooth regularization; and complex semialgebraic or TV-structured penalties (Song et al., 2024, Luo et al., 2016, Yang et al., 2019, Zhang et al., 2020). QNSP algorithms are tailored to situations where data volume is large and gradient computations dominate, but where rapid convergence is required due to expensive forward models or sampling steps.

The semismooth Newton framework, efficient dual problem handling for TV metrics, and empirical successes in high-dimensional ill-posed and sparse recovery regimes further illustrate its adaptability. This suggests QNSP is particularly well-suited to modern large-scale, composite-structured, and inverse-problem optimization scenarios, where second-order information must be exploited efficiently and scalably.