Quasi-Newton Acceleration Techniques

Updated 17 November 2025

Quasi-Newton acceleration methods are iterative strategies that approximate second-order derivative information using secant updates without forming Hessians explicitly.
They employ update schemes like Broyden, BFGS, and multi-secant least-squares to dynamically improve convergence rates in nonlinear equation solving and optimization.
These techniques are applied in unconstrained optimization, partitioned PDE solvers, and stochastic settings, significantly reducing iteration counts and stabilizing coupled simulations.

Quasi-Newton acceleration refers to a class of methods and update strategies that enhance the convergence properties of iterative solvers for nonlinear equations or optimization problems by dynamically approximating second-order derivative information, but without the explicit and costly computation of Hessians or Jacobians. The core mechanism involves leveraging curvature information inferred from the sequence of iterates, often through the use of secant or multi-secant relations, and applying this data to update approximate inverse Hessians, Jacobians, or coupling operators in order to speed up or stabilize convergence. Quasi-Newton acceleration is prominent in areas ranging from unconstrained optimization, convex composite minimization, and stochastic optimization to large-scale coupled PDE systems such as fluid-structure interaction.

1. Fundamental Principles of Quasi-Newton Acceleration

The paradigm of quasi-Newton acceleration is rooted in updating an approximate inverse of the Jacobian (or Hessian in optimization), using low-rank modifications that enforce secant equations derived from the iterates. For a root-finding problem $F(x)=0$ , the Newton step at $x$ is $s = [F'(x)]^{-1} F(x)$ ; however, $F'(x)$ is typically expensive or impossible to form and invert. In quasi-Newton, an approximation $H_k \approx [F'(x_k)]^{-1}$ is maintained and updated via formulas such as Broyden, BFGS, SR1, or more general multi-secant/matrix least-squares constructions.

The generic single-secant update is: $H_{k+1} = H_k + \frac{(s_k - H_k y_k) y_k^T}{y_k^T y_k}, \quad s_k = x_{k+1} - x_k, \; y_k = F(x_{k+1}) - F(x_k)$ For optimization, $y_k$ becomes the gradient difference.

Crucially, the quasi-Newton strategy advances both local and global convergence versus pure first-order methods by leveraging curvature data from the sequence of iterates, while avoiding the explicit formation or inversion of the differential operator.

2. Multi-Secant Strategies and Interface Quasi-Newton (IQN)

Multi-secant extensions generalize the single-secant constraint to impose $B_{k+1} S_k = Y_k$ , where $S_k$ and $Y_k$ collect $q>1$ recent displacement and gradient (or residual) vectors: $S_k = [s_{k-q},\ldots,s_{k-1}], \qquad Y_k = [y_{k-q},\ldots,y_{k-1}]$ A standard choice, especially within coupled multiphysics or fixed-point contexts, is to assemble a least-norm solution for $B_{k+1}$ : $\min_B \|B\|_F \quad \text{subject to } B S_k = Y_k$ which has the closed-form solution $B_{k+1} = Y_k (S_k^T S_k)^{-1} S_k^T$ (and possible symmetrization and PSD-correction as in (Lee et al., 9 Apr 2025)). Multi-secant strategies such as IQN-ILS (interface quasi-Newton with interface least-squares) or IMVLS (implicit multi-vector least squares) are central in partitioned and black-box applications (notably fluid-structure interaction) as they only require access to the system as a mapping and not to internal matrices (Spenke et al., 2022, Scieur, 2019).

Implementation Considerations:

The number of secant pairs $q$ (or memory $k$ ) is typically small ( $q \sim 5$ –10).
To ensure stability of the least-squares system, secant pairs leading to ill-conditioning are rejected or dropped (e.g., oldest pairs are removed).
Efficient updating and application of the resulting $B_{k+1}$ is critical: small least-squares systems $(k\times k)$ are solved via QR, and explicit storage of $B_{k+1}$ is often avoided (Spenke et al., 2022, Lee et al., 9 Apr 2025).

3. Quasi-Newton Acceleration in Partitioned PDE and Coupled Solvers

Quasi-Newton acceleration is used in fixed point iterations arising from partitioned solvers—e.g., fluid-structure interaction (FSI), domain decomposition, or other multiphysics problems—where the system naturally exposes only "black box" interface-to-interface mapping $h \mapsto \tilde h$ . In these settings, the Newton residual $r(h):= \tilde h(d(h), h) - h$ is available, and the update is: $h^{k+1} = h^k - B_k r^k$ with $B_k$ built online from secant data on $(h^j,r^j)$ .

Within FSI, the Robin–Neumann–quasi-Newton (RN-QN) framework merges a robust Robin boundary penalty (numerical permeability) preconditioning with interface quasi-Newton acceleration (Spenke et al., 2022). The per-timestep cycle:

Exchanges fluid-load and structure-displacement interface data,
Enforces a Robin boundary (combining tractions and velocity difference) on the fluid,
Updates interface loads via a quasi-Newton multi-secant matrix constructed from previous iterations using the IQN-ILS or IMVLS recipe.

This construction massively reduces both the added-mass induced instability (stiff two-way coupling) and the tuning sensitivity to the Robin penalty parameter. Performance is virtually independent of the Robin parameter in regimes where plain Robin–Neumann degrades or diverges (Spenke et al., 2022).

initialize h^1 (from previous step)
for k = 1,2,... until ||r^k|| < tol:
    # 1. Structure solve: h^k --> d^k
    # 2. Form Robin BC: T_f n_f = h^k + α^RN (∂d^k/∂t - u_f)
    # 3. Fluid solve: --> tilde h^k
    # 4. Residual: r^k = tilde h^k - h^k
    # 5. Append secant pairs to V_k, W_k
    # 6. Least squares: α^k = argmin ||V_k α + r^k||
    # 7. Update: h^{k+1} = h^k + W_k α^k
end

4. Proximal and Splitting Frameworks: Quasi-Newton for Composite Optimization

In convex composite minimization ( $F(x)=f(x)+g(x)$ ) and operator-splitting frameworks, quasi-Newton acceleration can be applied to forward–backward iterations by using a problem-adapted metric: $x_{k+1} = \operatorname{prox}_{g}^{B_k}(x_k - B_k^{-1}\nabla f(x_k))$ where $B_k$ is a symmetric positive definite approximation to $\nabla^2 f$ updated using rank-one (SR1), diagonal plus low-rank, or limited-memory formulas (Becker et al., 2018, Becker et al., 2012). The critical observation is that if $g$ is block-separable, the proximal operator in the new metric can be evaluated efficiently via duality and one or a few small root-finding subproblems, allowing $O(N)$ or $O(N\log N)$ cost per iteration (Becker et al., 2018). Strong global convergence and local superlinear rates are obtained when the update matrices remain uniformly bounded and the inner (scalar) root-finding achieves superlinear convergence.

Empirical findings show that quasi-Newton splitting algorithms may require up to 10× fewer iterations than accelerated first-order (FISTA/Ista) baselines and can achieve lower wall-clock time despite marginally higher cost per iteration (Becker et al., 2018, Becker et al., 2012).

5. Quasi-Newton Acceleration in Stochastic and High-Dimensional Regimes

Stochastic quasi-Newton (SQN) methods extend quasi-Newton acceleration to the stochastic optimization regime (e.g., neural network training) by combining variance-reduced or momentum-driven first-order steps with limited-memory curvature approximations. Key methodologies—such as SpiderSQN (Zhang et al., 2020) and momentum variants of SQN (Indrapriyadarsini et al., 2019, Indrapriyadarsini et al., 2021, Indrapriyadarsini et al., 2019)—demonstrate that when variability in curvature pair estimation is controlled, quasi-Newton updates can be effectively employed without losing stability.

Notably:

SPIDER-based gradient estimation intertwined with L-BFGS achieves $\mathcal{O}(n+n^{1/2}\epsilon^{-2})$ SFO complexity for reaching first-order stationarity, which is optimal for nonconvex finite-sum problems (Zhang et al., 2020).
Variance-reduced multi-secant and limited-memory updates ensure positive definiteness via damped secant choices or regularization.

This family of methods demonstrates strong empirical improvements in both nonconvex machine learning tasks and large-scale optimization, specifically in rapid convergence and robustness to noise.

6. Theoretical Guarantees, Convergence and Conditioning

Under standard assumptions (Lipschitz continuity, strong convexity, and boundedness of update matrices), quasi-Newton acceleration results in:

Local superlinear (often nearly quadratic) convergence to optima, provided that the Hessian approximation error diminishes appropriately ( $\|E_k\|\to 0$ at least linearly with error norm) (Mikkelsen et al., 2023, Jin et al., 2022).
Global $O(1/k)$ or accelerated $O(1/k^2)$ rates in appropriately regularized convex settings or with sufficiently accurate Hessian approximations (Agafonov et al., 27 Aug 2025, Scieur, 2023, Ghanbari et al., 2016).
Robustness to rounding and step-approximation errors: quadratically fast convergence is retained until stagnation at the level of floating-point precision $u$ , provided the relative step error is $O(\sqrt{u})$ (Mikkelsen et al., 2023).

In partitioned multiphysics FSI, numerical studies establish that interface quasi-Newton acceleration reduces convergence iteration counts by orders of magnitude, stabilizes previously parameter-sensitive schemes, and achieves convergence rates almost flat in penalty parameter variation (Spenke et al., 2022).

7. Practical Applications and Guidelines

Quasi-Newton acceleration is applicable wherever sequential iterates admit secant information, and especially where explicit Hessian or Jacobian computation is infeasible or costly. Applications include:

Large-scale statistical learning, sparse recovery, and variational problems via proximal splitting (Becker et al., 2018, Becker et al., 2012, Ghanbari et al., 2016).
Partitioned domain decomposition and multi-physics simulation (CFD-FSI, fluid–porous, thermo-mechanics), through interface acceleration and black-box coupling (Spenke et al., 2022).
Stochastic and online optimization for deep learning, where limited-memory, regularized, and momentum-enhanced SQN are particularly effective (Zhang et al., 2020, Indrapriyadarsini et al., 2019, Indrapriyadarsini et al., 2021).
Inexact and noisy environments, where penalty-based updates or adaptive regularization provide robustness (Berglund et al., 4 Mar 2024).

Common implementation guidelines include:

Limit the number of secant pairs to a manageable (5–10) for stability and computational tractability.
Regularize or project to ensure positive-definiteness, especially in stochastic, nonconvex, or multi-secant settings (Lee et al., 9 Apr 2025, Berglund et al., 4 Mar 2024).
Employ adaptive parameterization to reduce parameter sensitivity (e.g., penalty parameters or step lengths) (Spenke et al., 2022).

Summary Table: Key Variants and Contexts

Context	Secant Strategy	Robustification	Typical Gains
Partitioned FSI/PDE	Multi-secant IQN	IMVLS, Robin penalization	Flat param. sensitivity, ~4× fewer iterations
Composite minimization	Rank-1/r updates	Semi-smooth dual for prox	5–10× fewer iters vs. ISTA/FISTA
Stochastic opt / NN train	Limited-memory LBFGS, MoQ	Dampening, regularization	Up to 5× speedup over L-BFGS/SGD
Action-constrained solvers	Subspace matching	Least change, preconditioning	2–5× wall-time reduction

This illustrates that quasi-Newton acceleration constitutes a highly general and adaptable family of schemes, capable of delivering substantial speedups, improved stability, and broad robustness across nonlinear and high-dimensional scientific and engineering computation.