Learning-Accelerated Dual Solvers

Updated 24 January 2026

Learning-accelerated dual solvers are algorithmic frameworks that integrate machine learning with traditional dual or primal–dual iterative methods to improve convergence speed and solution accuracy.
These solvers employ neural predictor–corrector loops, homotopic initialization, and adaptive ADMM techniques to reduce iterations and computational costs in various constrained optimization settings.
Empirical results demonstrate significant speedups—up to 30× or more over classical solvers—and robust performance across quadratic programming, conic optimization, and nonlinear problems.

Learning-accelerated dual solvers are algorithmic frameworks that integrate machine learning components into traditional dual or primal–dual iterative optimization methods, with the central objective of improving speed and/or accuracy for solving constrained or structured problems. These solvers have been developed for a variety of settings, including quadratic programming, conic optimization, and general nonlinear constrained formulations. Approaches include directly learning to produce dual (or primal–dual) certificates via neural networks, initializing iterative methods with data-dependent predictors, learning adaptive strategy updates for solver parameters, and embedding learned steps into the inner loops of classical dual-based optimization schemes. The following sections provide a detailed account of the main classes of learning-accelerated dual solvers and their properties, as documented in the recent literature.

1. Problem Formulations and Duality Frameworks

Learning-accelerated dual solvers fundamentally address parametric constrained optimization problems of the form: $\min_{x\in\mathbb R^{n_x}} f(x;p) \quad \text{s.t.} \quad h(x;p) = 0, \quad g(x;p) \le 0$ with parameters $p\in\mathbb R^{n_p}$ . Here, dual variables $\nu\in\mathbb R^{n_h}$ (for equality constraints) and $\lambda\in\mathbb R^{n_g}$ (for inequality constraints) enter through the Lagrangian

$\mathcal{L}(x, \nu, \lambda; p) = f(x;p) + \nu^\top h(x;p) + \lambda^\top g(x;p).$

Gatekeeping optimality, the Karush–Kuhn–Tucker (KKT) conditions—stationarity, feasibility, and complementarity—encode necessary (and, for convex problems, sufficient) requirements for solution. Dual-based solvers may act directly in the dual space (as in dual coordinate ascent or conic duality) or maintain coupled primal–dual iterates (e.g., ADMM, primal–dual Newton methods, Lagrangian relaxations), with the goal of efficiently driving the residuals of the KKT system to zero (Lüken et al., 2024, Tanneau et al., 2024).

2. Neural-Primal–Dual Solvers and Predictor–Corrector Architectures

A prominent class of learning-accelerated dual solvers deploys neural predictor–corrector loops to produce highly accurate primal–dual certificates for a family of parametrized problems. In such two-stage architectures, a feedforward neural network $\Pi_{\mathrm{pred}}$ is trained to directly map problem parameters $p$ to an initial guess $(\hat x, \hat \nu, \hat \lambda)$ for the primal–dual variables. This is followed by a refinement network $\Pi_{\mathrm{solver}}$ , typically structured as another neural module, that iteratively produces corrective steps to minimize a KKT-based residual metric $T(z;p)=\frac12\|F(z;p)\|_2^2$ , where $F(z;p)$ encodes the smoothed KKT system. Both networks are trained via self-supervised objectives that penalize only the norm of the residuals, requiring no ground-truth optimal solutions (Lüken et al., 2024).

The learning-based solver achieves orders-of-magnitude improvements: on test cases with $n_x=100$ , $n_h=n_g=50$ , the combined predictor–solver pipeline ("LISCO") reaches KKT residuals below $10^{-8}$ with median $16$–$19$ iterations (versus $55$–$60$ without predictor), exhibits wall-clock times down to $10$ ms per instance (CPUs) and $1$ ms for $1000$-instance batches (GPUs). Compared to classical solvers (e.g., IPOPT, OSQP), this represents up to $30\times$ speedup at comparable solution quality.

No global convergence proof is available; in practice, the learned refinement steps closely mimic Newton-type corrections. Extension to ill-conditioned or large-scale sparse problems and broader constraint classes remains an open direction (Lüken et al., 2024).

3. Learning-Accelerated Dual Solvers in Conic Optimization

Dual Lagrangian Learning (DLL) focuses on generating dual-feasible certificates and valid Lagrangian bounds for linear and nonlinear conic programs using parameterized machine learning models (Tanneau et al., 2024). For a conic program

$\min_{x} c_\xi^\top x \quad \text{s.t.} \quad A_\xi x \succeq_K b_\xi,\; l_\xi \le x \le u_\xi$

the DLL model $M_\theta:\xi\mapsto \bar y$ produces a raw dual candidate, which is mapped via analytic projection $\Pi_{K^*}$ to a dual-feasible point. This is then processed via closed-form "dual completion" rules—e.g., $z^+ = [c_\xi - A_\xi^\top y]^+$ , $z^- = [c_\xi - A_\xi^\top y]^-$ —to produce a fully feasible dual certificate and associated lower bound. Training is fully self-supervised via maximization of the Lagrangian bound computed from the predicted duals.

Benchmarks demonstrate that DLL achieves sub- $0.5\%$ optimality gaps (mean), with wall-clock inference times at the millisecond scale, yielding up to $1000\times$ speedup over commercial conic solvers for the test classes considered. The framework guarantees valid dual bounds by construction, is fully self-supervised, and generalizes to diverse conic forms (LP, QP, SOCP, SDP) (Tanneau et al., 2024).

4. Learning-Accelerated Dual Iterative Schemes: Homotopy and Coordinate Descent

In traditional dual-coordinate iterative frameworks, initialization strategies strongly influence transient convergence rates. Homotopic dual learning (Daneshmand et al., 2017) constructs a sequence of dual subproblems with a decreasing regularization schedule $\mu_0 > \mu_1 > \cdots > \mu_K = \mu$ ; solutions to easier (heavily regularized) problem instances are leveraged as warm-starts for successively harder instances. For dual ridge regression, this homotopy fills otherwise hard-to-optimize kernel components and collapses nullspace suboptimality, yielding $10$– $100\times$ faster initial decrease in dual suboptimality for coordinate descent or gradient descent methods.

The theory is underpinned by the concept of $\tau$ -boundedness—limiting the covariance of response with low-variance eigenfeatures. Theoretical bounds guarantee accelerated transient decay of suboptimality proportional to $\tau$ and eigenspectrum properties. Empirical results on large-scale LIBSVM datasets validate the predicted speedups, with test-error plateau reached $2$– $10\times$ sooner than standard methods (Daneshmand et al., 2017).

5. Mini-Batch Stochastic Dual Coordinate Ascent with Acceleration

Mini-batch and accelerated variants of Stochastic Dual Coordinate Ascent (SDCA) realize learning-accelerated dual optimization by combining block-coordinate dual updates with Nesterov-type momentum. Accelerated SDCA (ASDCA) maintains primal–dual coupling and employs extrapolated points for loss gradient computation, interpolating between vanilla SDCA (for batch size $m=1$ ) and full-batch Accelerated Gradient Descent (AGD, $m=n$ ). The resulting method achieves a linear convergence rate

$\Bigl(1 - \frac{\theta m}{n}\Bigr)^t$

where $\theta$ is an adaptive step-size parameter, and $m$ is the mini-batch size. By tuning $m$ and $\theta$ , ASDCA optimally balances per-iteration costs with iteration count—yielding superior performance under distributed and parallel computing regimes (Shalev-Shwartz et al., 2013).

6. Context-Aware and Inexact Learning-Accelerated Dual ADMM Solvers

Recent work integrates neural modules into the Alternating Direction Method of Multipliers (ADMM) for convex quadratic programs to accelerate convergence in dual-based splitting algorithms. Multiple approaches are salient:

Neural-accelerated Inexact ADMM (Gao et al., 14 May 2025): Here, a lightweight LSTM replaces exact solutions to subproblems—producing $\tilde x^{k+1}$ inexact iterates trained to satisfy quantified inexactness criteria for Lagrangian decrease and residual control. The training loss is the running average of summed primal–dual residuals; convergence is guaranteed by the inexact ADMM theory, provided the learned module satisfies the residual and energy-decrease bounds. The method delivers $7\times$ , $28\times$ , and $22\times$ speedups over Gurobi, SCS, and OSQP on benchmark QPs, without violating primal–dual feasibility thresholds.
Context-aware adaptive ADMM (CA-ADMM) (Jung et al., 2022): This approach learns to adapt the step-size parameter $\rho$ of ADMM using a spatial–temporal neural policy (heterogeneous graph attention + GRU) that extracts the structural and temporal context from the QP and the ADMM iteration trajectory. The learned policy determines $\rho$ to minimize total ADMM iterations, yielding $2$– $8\times$ fewer iterations than reference implementations and sustaining performance generalization up to $n=1000$ and across a range of QP/LP classes. The underlying ADMM update remains theoretically unchanged, guaranteeing exactness and preserving global convergence, as $\rho$ is adaptively modulated in a data-driven manner.

7. Limitations and Emerging Directions

Notable limitations across learning-accelerated dual solvers include the absence of global convergence guarantees for most neural refinement schemes (open research on Lyapunov-based or hybrid proof techniques), limitations to moderate-size and well-conditioned problems, and restricted support for non-smooth or integer constraints (Lüken et al., 2024, Tanneau et al., 2024). Conic-based frameworks may incur cubic costs for large-scale SDPs, despite algorithmic advances. For coordinate-based schemes, rigorous transient acceleration is tied to distributional properties ( $\tau$ -boundedness) that are not always present in application data.

Scalable exploitation of structure (e.g., sparsity, block decomposability), improved network architectures for predictor and solver modules, and hybrid integration with classical methods (e.g., DLL with interior-point or ADMM) are active areas of development. Extending these frameworks to handle ill-conditioned, nonconvex, or combinatorial structures remains a significant challenge.

References:

(Lüken et al., 2024) Self-Supervised Learning of Iterative Solvers for Constrained Optimization (Tanneau et al., 2024) Dual Lagrangian Learning for Conic Optimization (Daneshmand et al., 2017) Accelerated Dual Learning by Homotopic Initialization (Shalev-Shwartz et al., 2013) Accelerated Mini-Batch Stochastic Dual Coordinate Ascent (Gao et al., 14 May 2025) A Learning-Based Inexact ADMM for Solving Quadratic Programs (Jung et al., 2022) Learning context-aware adaptive solvers to accelerate quadratic programming