Preconditioned Second-Order Convex Splitting

Updated 17 December 2025

The paper introduces preconditioned second-order convex splitting algorithms that combine IMEX updates with dynamic DC splitting to enhance convergence in nonconvex optimization.
It employs advanced techniques such as matrix preconditioning, extrapolation, and Armijo line search to significantly reduce iteration counts and computational cost.
Empirical results demonstrate improved efficiency in applications like sparse regression, variational image segmentation, and consensus optimization.

Preconditioned second-order convex splitting algorithms constitute a class of advanced methods for nonconvex and large-scale convex optimization, distinguished by their use of higher-order time discretization, difference-of-convex (DC) splitting with dynamically varying convex components, matrix-free preconditioning, and, in some variants, extrapolation and line search acceleration. These frameworks combine implicit–explicit (IMEX) second-order updates (notably BDF2 and Adams–Bashforth schemes) with classical preconditioners, delivering improved convergence, robustness, and practical efficiency, particularly on problems with difficult regularization or nonsmooth structures (Shen et al., 12 Nov 2024, Shen et al., 16 Dec 2025). They are rigorously analyzed and empirically shown to outperform first-order and classical DC methods on a variety of machine learning and PDE-constrained problems.

1. Problem Framework and Algorithmic Structure

The canonical problem is the composite minimization

$\min_{u\in X}~E(u) = H(u) + F(u)$

where $X$ is a finite-dimensional Hilbert space, $H\colon X\to\mathbb{R}$ is closed, convex (possibly nonsmooth), and $F\colon X\to\mathbb{R}$ is differentiable with $L$ -Lipschitz continuous gradient. Many applications, such as sparse regression and variational image segmentation, instantiate this structure, with DC splitting (either as a fixed or dynamically varying decomposition) exposing critical algorithmic leverage (Shen et al., 12 Nov 2024, Shen et al., 16 Dec 2025).

Algorithmic updates employ a second-order BDF2 time-discretization for the implicit (convex) components and a second-order Adams–Bashforth explicit treatment for the nonconvex or nonlinear gradient contributions. The typical iteration for the (varying-)DC splitting framework is:

Auxiliary energy construction
- $H^k(u) = H(u) + \frac{c_1}{\Delta t}\|u - u^k\|^2$
- $F^k(u) = \frac{c_2}{\Delta t}\|u - u^{k-1}\|^2 - F(u) - \langle f(u^k) - f(u^{k-1}), u - u^{k-1}\rangle$ where $c_1, c_2$ are specified by the discretization choice.
Convex (sub)problem solution: For preconditioner $M \succeq 0$ , compute either exactly or approximately

$u^{k+1} = \arg\min_u \left\{ H^k(u) - \langle \nabla F^k(u^k), u\rangle + \frac12\|u - y^k\|_M^2 \right\}$

where $y^k$ optionally includes extrapolation (Shen et al., 16 Dec 2025).
Descent enhancement (optional): Armijo line search or acceleration via extrapolation parameters.

This second-order IMEX convex splitting, combined with dynamic (varying) convexification, provides improved stability compared to first-order or fixed DC methods and allows for efficient large-scale iterations (Shen et al., 12 Nov 2024, Shen et al., 16 Dec 2025).

2. Preconditioning Strategies

Matrix preconditioning is central to practical efficiency and scalability. In the setting where the implicit part $H(u)$ is quadratic ( $A \succeq 0$ ), the linear system in each subproblem is of the form

$(T + M)u^{k+1} = \text{RHS}$

with $T$ derived from the discretization and $M$ the proximal weight. Efficient stationary iterative preconditioners include:

Jacobi: $M = D$ , the diagonal part of $T$ .
Richardson: $M = \alpha I$ , with $\alpha \geq \|T\|$ .
Symmetric Gauss–Seidel (SGS): $M = (D - E) D^{-1} (D - E^T)$ for $T = D - E - E^T$ .

Preconditioning controls the condition number of the subproblem, reducing the number of required linear (or inner) iterations per outer update, and often enables matrix-free (sparse, iterative) solutions (Shen et al., 12 Nov 2024, Shen et al., 16 Dec 2025).

For large-scale operator-splitting contexts, randomized Nyström preconditioners have been shown effective, as in the GeNIOS framework (Diamandis et al., 2023), which further justifies and extends preconditioning principles to second-order convex splitting in high dimensions.

3. Acceleration Techniques: Extrapolation and Line Search

Modern variants incorporate momentum-type extrapolation and adaptive line search to enhance convergence rates. Extrapolation, similar to FISTA updates, takes the form

$y^k = u^k + \beta_k (u^k - u^{k-1})$

with the extrapolation weight $\beta_k \in [0,1)$ selected statically or adaptively. Some frameworks generalize this to include gradient extrapolation with scaling parameter $\omega_k$ (Shen et al., 16 Dec 2025).

Armijo-type backtracking is applied to descent directions $d^k = y^k - u^k$ to ensure sufficient decrease in the (auxiliary) energy: $E^k(y^k + \lambda d^k) \leq E^k(y^k) - \alpha \lambda \|d^k\|^2$ for step size $\lambda = \beta^m \lambda_0$ , accelerated convergence (often linear in local regimes) is observed (Shen et al., 12 Nov 2024).

4. Convergence and Rates: Theoretical Guarantees

Global convergence analysis leverages the Kurdyka-Łojasiewicz (KL) property, which holds broadly for semi-algebraic and real-analytic energies (Shen et al., 12 Nov 2024, Shen et al., 16 Dec 2025). Key theoretical assertions include:

Bounded energy descent and $\ell_2$ -summability: $\sum_k \|u^{k+1} - u^k\| < \infty$
All limit points are critical for $E$
Full sequence convergence to a critical point under the KL property
Local rates depend on the KL exponent $\theta$ $θ$ :
- $\theta = 0$ : finite termination
- $\theta \in (0, 1/2]$ : local Q-linear convergence
- $\theta \in (1/2, 1)$ : sublinear convergence rate $O(k^{-(1-\theta)/(2\theta-1)})$

Selection of timestep $\Delta t$ (e.g., $\Delta t < 3/(4L)$ or similar) and bounded step sizes for line search are necessary for these guarantees (Shen et al., 12 Nov 2024, Shen et al., 16 Dec 2025).

5. Computational Complexity and Implementation

Per-iteration cost is moderate and explicitly controlled:

Each outer iteration computes gradients $h$ , $f$ ; forms and solves (preconditioned) linear system
For sparse problems, cost is $O(n)$ or $O(\mathrm{nnz}(A))$ per preconditioned step
Backtracking requires a small (problem-dependent) number of extra function evaluations
In operator-splitting contexts with randomized preconditioners, preconditioner formation is $O(r \cdot \text{matvec})$ , per-iteration cost is dominated by $m_k$ CG iterations, each $O(\text{matvec})$ and $O(nr)$ (Diamandis et al., 2023)

Tabulated summary:

Step	Cost per iteration	Typical approach
Gradient evaluations	$O(n)$	analytic or auto-diff
Preconditioner/application	$O(n)$ to $O(nr)$	Jacobi, SGS, Nyström (randomized)
Linear/proximal solve	$O(\text{cg-iter}\cdot n)$	Iterative, matrix-free
Line search evaluations	a few $\times O(n)$	Armijo rule

Preconditioning allows for a fixed, small number of inner iterations, rendering each outer step computationally equivalent to a single (but well-conditioned) linear system solution (Shen et al., 12 Nov 2024, Diamandis et al., 2023).

6. Empirical Performance and Applications

Extensive numerical studies confirm the efficiency and solution quality of preconditioned second-order convex splitting algorithms:

Sparse regression with nonconvex regularizers (e.g., SCAD): Second-order preconditioned algorithms with line search or extrapolation attained 2–5× faster convergence and 30–60% fewer iterations than DCA, BDCA, and first-order DC methods (Shen et al., 12 Nov 2024, Shen et al., 16 Dec 2025).
Graph-based semi-supervised segmentation: On sparse graphs with up to $10^5$ nodes, preconditioned and line search variants reached DICE $>0.993$ in half the CPU time and similar iteration counts compared to non-preconditioned or first-order variants (Shen et al., 12 Nov 2024).
Large-scale consensus optimization: Randomized preconditioners delivered up to 50× speedups for dense convex problems, demonstrating scalability (Diamandis et al., 2023).

A plausible implication is that as problem sizes and ill-conditioning increase, the gain from preconditioning and second-order IMEX schemes becomes essential for practical tractability.

7. Relation to Other Second-Order and Operator Splitting Methods

This class of algorithms generalizes and connects to multiple established frameworks:

Classic semi-smooth Newton and adaptive projection methods: Both seek second-order acceleration for composite convex problems, but the preconditioned convex splitting approach targets broader nonconvexity via varying-DC decompositions and higher-order implicit–explicit discretization (Xiao et al., 2016).
Interior-proximal primal-dual algorithms: These exploit barrier-based preconditioning on the dual variable, achieving linear convergence under strong monotonicity for problems involving the second-order cone (Valkonen, 2017).
Operator splitting and inexact ADMM: Second-order subproblem approximations with randomized preconditioning (e.g., GeNIOS) reflect a similar philosophy in large-scale convex consensus optimization (Diamandis et al., 2023).

This methodological convergence suggests that preconditioned second-order convex splitting sits at the intersection of convex splitting, DC programming, primal-dual methods, and randomized preconditioning, providing a unified toolkit for modern large-scale and nonconvex variational problems.

References:

(Shen et al., 12 Nov 2024): "A preconditioned second-order convex splitting algorithm with a difference of varying convex functions and line search"
(Shen et al., 16 Dec 2025): "A preconditioned second-order convex splitting algorithm with extrapolation"
(Diamandis et al., 2023): "GeNIOS: an (almost) second-order operator-splitting solver for large-scale convex optimization"
(Xiao et al., 2016): "A Regularized Semi-Smooth Newton Method With Projection Steps for Composite Convex Programs"
(Valkonen, 2017): "Interior-proximal primal-dual methods"