Splitting Proximal Point Algorithms

Updated 18 January 2026

Splitting proximal point algorithms (SPPAs) are iterative methods that decompose composite optimization problems by applying prox-resolvent updates to individual components.
They encompass classical methods like Douglas–Rachford and Forward–Backward Splitting, and extend to stochastic and nonconvex scenarios with structure-preserving variable reduction.
SPPAs guarantee weak convergence under monotonicity and convexity, while augmented strategies using Tikhonov regularization or Halpern iteration drive strong convergence for large-scale and distributed applications.

A splitting proximal point algorithm (SPPA) is a class of iterative methods designed to solve composite operator inclusion or minimization problems by decomposing the objective into constituent parts and applying prox-resolvent or proximal-like updates to each part separately. These algorithms have become fundamental in convex and nonconvex optimization, monotone inclusions, equilibrium analysis, and beyond. Modern SPPAs generalize the Rockafellar proximal point algorithm through operator splitting—including classical Douglas–Rachford splitting, forward–backward splitting, Davis–Yin (three-operator) splitting, and their stochastic and nonconvex extensions. The resulting methods can be viewed as degenerate or preconditioned PPAs in suitable product spaces, sometimes admitting structure‐preserving variable reductions and possessing a diverse array of convergence properties.

1. Historical Foundations of Splitting and Proximal Point Algorithms

The classical proximal point algorithm (PPA) [19] iteratively applies the resolvent of a monotone operator, $x_{k+1} = (I + \gamma A)^{-1}(x_k)$ , to identify zeros of maximally monotone operators. The ability to interpret the PPA as a fixed-point iteration based on resolvents or proximity operators gave rise to a geometric understanding of nonexpansive mappings and their fixed points. When the underlying problem involves a sum of operators or functions, direct application of the resolvent may be intractable.

Splitting algorithms address this by handling the sum $A+B$ (or more general compositions) by alternately or sequentially applying the resolvents $J_A$ , $J_B$ , possibly in appropriate metrics or product spaces. The Douglas–Rachford splitting (DRS) is a canonical example: for $0\in A(x)+B(x)$ , the DR operator is $\mathrm{DR}_\gamma(z) = J_{\gamma A}(2 J_{\gamma B} - I) + I - J_{\gamma B}$ , with primal update $x = J_{\gamma B}(z)$ . In high dimensions or non-Hilbert geometries, strong convergence is not guaranteed; numerous variants address this by augmenting the metric or regularization (e.g., Tikhonov regularization (Bot et al., 2016)), or considering non-Hilbertian geodesic spaces (e.g., CAT( $\kappa$ ) spaces (Espínola et al., 2016)).

2. Algorithmic Structure and Key Variants

Several core algorithmic frameworks fall under the splitting proximal point paradigm:

Douglas–Rachford Splitting (DRS): A fixed-point iteration acting in either a primal or lifted dual space, enabling the splitting of the inclusion $0\in A+B$ (Xue, 2023). This can be interpreted as a degenerate PPA in a product space with a singular (semidefinite) preconditioner (Bredies et al., 2021).
Forward–Backward Splitting (FBS): Suitable for $A = \nabla f$ (Lipschitz differentiable), $B = \partial g$ (maximally monotone). The iteration is $x_{k+1} = \mathrm{prox}_{\gamma g}(x_k - \gamma \nabla f(x_k))$ , a discretized gradient-proximal step (Becker et al., 2012).
Primal–Dual Schemes (Chambolle–Pock, Condat–Vu, Davis–Yin): Operate on product spaces to decouple linear operators or constraints (e.g., with nonsmooth terms or linear mappings), and allow for additional splitting of $N$ -term objectives (Condat et al., 2019, Condat et al., 2020).
Degenerate Preconditioned PPAs: Generalize PPA to variable-metric or degenerate metrics, reducing the problem’s dimension via kernel factorization and yielding new, sometimes more efficient, sequential/parallel splitting schemes (Bredies et al., 2021).
Halpern and Tikhonov Enhanced Splittings: Introduce strong convergence toward minimal norm solutions by embedding Halpern iteration or Tikhonov regularization into the update sequence, overcoming weak convergence limitations in classical products (Zhang et al., 2024, Bot et al., 2016).
Stochastic and Randomized Splittings: Incremental and supermartingale-based variants randomly select component updates per iteration, enabling unbiased stochastic convergence for sum-of-functions objectives (not necessarily convex) (Brito et al., 11 Jan 2026).

A key theme is that operator splitting enables separate handling of the constituent terms—including nonsmooth, nonconvex, or structured constraints—while leveraging the robust convergence of the PPA framework.

3. Theoretical Properties and Convergence

Classical SPPAs guarantee weak convergence under monotonicity and convexity. When the inclusion operator fails to be cyclically monotone, as is generically the case for DRS in dimensions $>1$ , one loses the variational (proximal mapping) interpretation, and thus also access to acceleration via prox structure (Xue, 2023). The degenerate metric framework clarifies this limitation—the DR operator is the resolvent of a maximally monotone but generally non-cyclically-monotone operator and cannot be written as a proximity operator except in the trivial one-dimensional case.

Main convergence results depend on the properties of the constituent operators:

Monotone Setting: If all operators are maximal monotone, the fixed-point sequence converges weakly to a solution (Bredies et al., 2021).
Non-monotone/Semimonotone Setting: Oblique weak Minty or semimonotonicity conditions yield global Fejér monotonicity and ensure global convergence of relaxed/modified SPPAs, with best-iterate $O(1/N)$ decay and local linear convergence for polyhedral cases (Evens et al., 2023).
Nonconvex/Prox-Convex Extensions: For prox-convex (not necessarily convex) functions, deterministic and randomized SPPAs yield global convergence under suitable diminishing step size rules; almost sure convergence in the stochastic variant is established via supermartingale techniques (Brito et al., 11 Jan 2026).
Degenerate Preconditioning: When the preconditioner is positive semidefinite with nontrivial kernel, the iteration can be reduced to an effective lower-dimensional subspace, and strong convergence is restored by supplementing Tikhonov regularization or Halpern–type anchor terms (Zhang et al., 2024, Bot et al., 2016).
Rates: Classical SPPAs with exact prox structure can be accelerated (e.g., FISTA in FBS) to $O(1/k^2)$ , but in degenerate metric cases (as in DR), only $O(1/k)$ nonergodic rates are available, unless additional strong convexity or geometrical regularity is present (Xue, 2023, Condat et al., 2020).

4. Practical Implementations, Extensions, and Applications

SPPAs have been adapted to handle numerous large-scale and structured problems:

Distributed and Parallel Splitting: Modern distributed algorithms split the variables or features across multiple nodes, enabling parallel evaluation of partial proximal operators—critical in ultra-high-dimensional or federated settings (Wu et al., 3 Apr 2025, Condat et al., 2020).
Product Space and Block-Structured Problems: SPPAs in product spaces allow the decomposition and parallelization of multi-term objectives, e.g., sum of smooth and nonsmooth functions, or composite regularizers as in OSCAR (Condat et al., 2019, Zeng et al., 2013).
Nonconvex and Fractional Programming: By decoupling subgradients, gradients, and conjugate proximals, full splitting PPAs handle structured nonconvex fractional programs, securing subsequential convergence for Kurdyka–Łojasiewicz functions using specialized merit functions and adaptive step size control (Boţ et al., 2023).
Geometric and Nonlinear Spaces: SPPA variants have been established in non-Euclidean spaces with curvature (CAT $(\kappa)$ ), providing convergence (in Lim’s $\Delta$ -sense) to minimizers under controlled step size and geodesic conditions (Espínola et al., 2016).
Stochastic or Inexact Proximals: Recent advances provide Monte Carlo/Hamilton–Jacobi-based approximations of proximal steps (HJ-Prox), enabling use of splitting even when the prox operator is not available in closed form, while still retaining overall method convergence (Di et al., 9 Sep 2025).
Point Source Localization and Measures: Extensions to Banach spaces and spaces of measures employ wave-particle metrics to define and compute proximal updates, for use in high-resolution inverse problems (Valkonen, 2022).

5. Degenerate Metric and Variable Reduction: Structural Insights

The degenerate preconditioned PPA framework (Bredies et al., 2021) reveals that many splitting algorithms are iterations of a resolvent with a positive semidefinite (but not strictly positive-definite) metric. This degeneracy induces a natural dimension reduction: updates depend only on a “core” variable, with redundant auxiliary directions eliminated via projection. For example, in DR, the iterates on $(x, y)\in H^2$ descend to a one-variable iteration on $w = x - y$ via the structure induced by the degenerate preconditioner. Variable-reduced splittings, such as the sequential Forward–Douglas–Rachford (SeqFDR), exploit this fact for efficient blockwise updates with weaker requirements on operator invertibility (Bredies et al., 2021).

This perspective unifies the analysis of classical and new splitting schemes (e.g., Chambolle–Pock, Forward–Douglas–Rachford, Peaceman–Rachford), enables flexible sequential or parallel block-splitting, and connects the emergence of non-proximality (failure of cyclic monotonicity) to the noninvertibility of the degenerate metric (Xue, 2023).

6. Limitations and Pathways to Strong Convergence

Classical SPPAs guarantee only weak convergence unless additional coercivity or regularity is assumed. Methods to upgrade weak to strong convergence include:

Tikhonov Regularization: Introducing vanishingly small regularization steps forces convergence to minimal-norm solutions, even in infinite dimensions (Bot et al., 2016).
Halpern-Type Anchors: Embedding a diminishing weighted average with an anchor point ensures strong convergence to a particular projection of the solution set (e.g., closest to the anchor in a metric sense) (Zhang et al., 2024).

The fundamental obstacle to acceleration in non-proximal/degenerate splitting is the absence of a global variational characterization (no $x^{k+1} = \arg\min F(x) + \tfrac{1}{2} \|x-x^k\|^2$ ), which invalidates O( $1/k^2$ ) rates unless reconcilable via additional prox structure or strong convexity (Xue, 2023).

7. Illustrative Applications and Numerical Performance

Applications of SPPAs are found in compressed sensing, image restoration, regression with complex regularizers (e.g., OSCAR (Zeng et al., 2013)), Dantzig selectors (Wu et al., 3 Apr 2025), fractional programming (Boţ et al., 2023), equilibrium and game-theoretic models with nonconvex cost functions (Quoc et al., 2011), and geometric computation in non-Euclidean spaces (Espínola et al., 2016).

Key empirical findings include:

Feature partitioning in high-dimensional Dantzig selectors enables partition-insensitive, highly parallel, linearly convergent distributed solvers (Wu et al., 3 Apr 2025).
Weighted sorted $\ell_1$ norms (OSCAR) enable exact and fast approximate splitting for group-sparse signal recovery, with APO offering near-identical accuracy at a fraction of the GPO cost (Zeng et al., 2013).
Stable linearly constrained problems benefit from proximal-projection methods, which ensure feasibility at every iterate and demonstrate competitive or superior speed and robustness compared to classical penalty or dual methods (Heaton, 2024).

In summary, splitting proximal point algorithms form a highly general, robust, and flexible methodology underpinning modern convex (and certain nonconvex) optimization, monotone inclusions, and structured equilibrium computation. Via the structural lens of metric degeneracy and variable reduction, these methods unify a large array of classical and contemporary techniques, clarify the intrinsic limitations to variational acceleration, and enable broad extension to stochastic, distributed, and geometrically nonlinear settings (Xue, 2023, Bredies et al., 2021, Condat et al., 2019, Zhang et al., 2024).