PDHG: Primal-Dual Hybrid Gradient

Updated 14 December 2025

PDHG is a first-order splitting method for convex-concave saddle point problems that alternates proximal updates in the primal and dual variables to ensure convergence under minimal assumptions.
It effectively combines explicit gradient steps with block-coordinate and adaptive proximal updates, enabling parallelization and efficient solutions in applications like imaging, LP, and PDE-constrained optimization.
Advanced variants such as stochastic, accelerated, and preconditioned PDHG extend its applicability to large-scale and hardware-accelerated problems, yielding improved convergence rates and practical performance.

The Primal-Dual Hybrid Gradient (PDHG) algorithm is a first-order splitting method for convex-concave saddle point and constrained convex optimization problems, particularly those with a bilinear coupling between the primal and dual variables. Originating with Chambolle and Pock (2011), PDHG has become a core building block in large-scale linear programming (LP), imaging, PDE-constrained optimization, inverse problems, and distributed control. PDHG alternates between (block-)proximal steps in the primal and dual variables, typically coupled by an explicit gradient/extrapolation on the linear term, and is notable for broad applicability, simple parallelization, and robust convergence properties under minimal assumptions.

1. Mathematical Formulation and Algorithm Structure

Let $\mathbb{X},\mathbb{Y}$ be real Hilbert spaces, $A:\mathbb{X}\to\mathbb{Y}$ linear, and $f:\mathbb{X}\to(-\infty,+\infty]$ , $g:\mathbb{Y}\to(-\infty,+\infty]$ proper, closed, convex. The canonical problem class is the convex-concave saddle point

$\min_{x\in\mathbb{X}} \ \max_{y\in\mathbb{Y}} \ \mathcal{L}(x,y) = f(x) + \langle Ax, y\rangle - g^*(y)$

which covers many constrained convex programs and their Fenchel duals. The classical PDHG iteration, for fixed primal and dual stepsizes $\tau,\sigma>0$ and an over-relaxation parameter $\theta\in[0,1]$ , is: $\begin{aligned} y^{k+1} &= \operatorname{prox}_{\sigma g^*}\bigl(y^k + \sigma A \bar{x}^k\bigr) \ x^{k+1} &= \operatorname{prox}_{\tau f}\bigl(x^k - \tau A^\top y^{k+1}\bigr) \ \bar{x}^{k+1} &= x^{k+1} + \theta(x^{k+1} - x^k) \end{aligned}$ where the proximal operators act as regularized minimizations over $f$ and $g^*$ . Under standard assumptions (e.g. $\tau\sigma \|A\|^2 < 1$ ), the sequence $(x^k, y^k)$ converges to a saddle point, and the ergodic (averaged) primal-dual gap decays as $O(1/k)$ , with linear convergence in strongly-convex/strongly-concave cases (Malitsky, 2017, Lu et al., 2023, Ma et al., 2023).

2. Connections, Variants, and Extensions

Reductions and Relationships

For linearly-constrained convex minimization ( $\min_x g(x): Ax = b$ ), PDHG admits an exact reduction to a primal-only scheme, generating the same sequence as a two-variable recursion on $(x^k,s^k)$ , with theoretical guarantees even when duality fails or the problem is infeasible. The reduction hinges on the dual sequence being an explicit function of averaged primal iterates (Malitsky, 2017).
PDHG generalizes and subsumes linearized/indefinite-proximal ADMM, the Douglas–Rachford splitting, and the augmented Lagrangian method with appropriate choices of proximal metrics and step balancing (Yu et al., 8 Jun 2025, Ma et al., 2023).

Accelerations

Strongly convex problems admit O(1/k²) or even linear rates. Variants leveraging Nesterov-style momentum, adaptive extrapolation, or Bregman-proximal steps achieve accelerated convergence. Notably, the iteration-varying scheme with schedule $\tau_{k+1}\sigma_k = s^2$ , $\theta_k = \tau_{k+1}/\tau_k \in (0,1)$ realizes exact O(1/k²) Lyapunov-based rates (Zeng et al., 26 Jul 2024). Nonlinear accelerating variants also exploit problem geometry with Bregman divergences (Darbon et al., 2021).
Parameter-free implementations combine inexact line search over stepsizes and over-relaxation, yielding robust convergence and linear decay in residuals without prior knowledge of operator norms or strong-convexity constants (McManus et al., 21 Mar 2025).

Stochastic and Block-Coordinate Schemes

Block-coordinate and stochastic PDHG (SPDHG) allow per-iteration cost to scale with block size and data sample count, crucial for big data and inexact or online settings. Arbitrary-block or random selection in the dual step yields provably O(1/t) ergodic rates in expectation, with O(1/t²) or linear rates under (block) strong convexity (Qiao et al., 2018, Chambolle et al., 2017).
High-probability iteration complexity bounds for SPDHG have been established, supporting empirical robustness to gradient noise and data sampling (Qiao et al., 2018).

Preconditioning and Adaptivity

Preconditioned PDHG (PrePDHG) replaces scalar stepsizes with self-adjoint, positive definite matrices $(M_1, M_2)$ , often improving robustness to ill-conditioning. The tight convergence region is characterized by $\|M_2^{-1/2} K (M_1+\frac12\Sigma_f)^{-1/2}\|^2 < 4/3$ , which cannot be further relaxed (Ma et al., 2023).
Adaptive step-size strategies—residual balancing, backtracking, and online learning-inspired diagonal scaling—enable black-box parameter selection and track changes in problem structure or operator norm, yielding notable speed-ups over fixed-parameter schemes (Goldstein et al., 2013, Lu et al., 21 Jun 2025). Frequency-controlled and normalized online preconditioner updates further improve wall-clock efficiency in practice.

3. Geometric and Complexity Analysis in Linear Programming

The adoption of PDHG as a general-purpose LP solver (e.g., in PDLP and cuPDLP) motivates a refined geometric understanding:

PDHG iterates on LPs progress in two stages: an identification stage in which the active basis is located (with sublinear complexity dependent on a non-degeneracy metric $\delta$ ), and a local linear convergence stage governed by a sharpness constant of the reduced KKT cone (Lu et al., 2023). Degeneracy itself does not slow convergence, but near-degeneracy increases the duration of the initial phase.
An explicit geometric description for standard-form LPs shows that PDHG trajectories, within a fixed active set, trace an exact spiral in the (primal, dual) space, with contraction toward an affine center plus a constant "marching" direction that drives the duality gap (Liu et al., 23 Sep 2024). Boundary-collision (basis change) events correspond to the spiral crossing facets of the feasible set, and restart the process in a new subspace.
Crossover strategies enhancing PDHG with a "corner push" yield primal solutions closer to vertices of the optimal face, substantially reducing the cost of sequential crossover steps required for basic solutions at the price of more PDHG iterations (Rothberg, 17 Nov 2025).

4. Algorithmic Balance, Preconditioning, and Structure-Exploitation

The Triple-Bregman Balanced Primal-Dual Algorithm (TBDA) generalizes PDHG and similar schemes by incorporating distinct Bregman distances in each subproblem and introducing a double-dual update. This balancing enables larger step sizes and can leverage structure in applications where the primal is more costly than the dual (Yu et al., 8 Jun 2025).
Preconditioned and diagonalized PDHGs, including the classical Pock–Chambolle scaling and balanced augmented Lagrangian methods, achieve faster convergence and empirical speed-ups in ill-conditioned, sparse, or distributed environments. For block-sparse or graph-structured consensus problems, PDHG can be rewritten to require only one communication per iteration, halving the communication cost compared to classical versions (Malitsky, 2017).
In-memory computing platforms, such as RRAM crossbar arrays, support distributed implementation of PDHG. With single-shot programming of a symmetric block matrix, all matrix-vector updates can be performed analogically, minimizing write cycles and achieving ~200–600× energy and ~10–100× latency reductions versus GPU implementations on large-scale LPs (Vo et al., 25 Sep 2025).

5. Practical Applications and Large-Scale Use Cases

PDEs, Control, and Modeling

PDHG has been applied to fully implicit/semi-implicit time-stepping for stiff reaction-diffusion systems, handling large-scale discretizations efficiently with G-prox preconditioning, FFT/DCT-invertible matrix blocks, and linearized updates in each variable (Liu et al., 2023). In all tested equations (Allen–Cahn, Cahn–Hilliard, Schnakenberg, and predator–prey models), residuals exhibit geometric decay to machine precision per time-step, outperforming Newton-SOR and similar solvers.
In Model Predictive Control (MPC) under continuous path constraints, parallelized PDHG solves large block-SDPs with hundreds of small gram-matrix constraints, mapping each to GPU threads. This enables guaranteed constraint satisfaction and roughly 700–1,300× speed-up per step compared to general-purpose SDP solvers (Li et al., 2023).
For implicit time discretizations of Hamilton-Jacobi PDEs, saddle-point and operator splitting formulations lead to fast, explicit, and scalable PDHG variants, effective for both smooth and non-smooth Hamiltonians with time and spatial dependence (Meng et al., 2023).

Machine Learning and Adversarial Training

In adversarial neural methods for PDEs, PDHG with tailored preconditioning and natural-gradient steps for neural network parameters, together with matrix-free Krylov (MINRES) solvers for parameter metrics, yields more stable, robust convergence than Adam/L-BFGS or physics-informed networks, especially in high-dimensional problems up to $d=50$ (Liu et al., 9 Nov 2024).
Nonlinear and Bregman-proximal PDHG variants can exploit entropy and mirror-geometry for $\ell_1$ -constrained logistic regression and entropy-regularized matrix games, obtaining best-possible $O(1/k^2)$ or linear convergence rates, reducing SVD and norm-computation costs compared to classical schemes (Darbon et al., 2021).
In large-scale stochastic or online optimization, PDHG combines the computational benefits of stochastic gradients (per-iteration sample access) with structured regularization, providing high-probability guarantees of $O(1/\sqrt{t})$ or $O(1/t)$ rates (Qiao et al., 2018, Chambolle et al., 2017).

6. Convergence Theory, Progress Metrics, and Adaptive Schemes

Monotonic progress in PDHG can be rigorously measured by the “infimal sub-differential size” (IDS), a generalization of the gradient norm to nonsmooth saddle-point problems. IDS decays monotonically with $O(1/k)$ sublinear rate under general assumptions and linear rate with metric sub-regularity (met by LPs, regularized quadratics, and TV-denoising) (Lu et al., 2022).
Ergodic rates $O(1/k)$ for primal-dual gap and feasibility are the norm under convexity; $O(1/k^2)$ (Nesterov optimal) for (block) strongly-convex settings; exact linear convergence under metric sub-regularity.
Step-size and parameter adaptivity are critical for PDHG's practical efficiency. Residual-balancing schemes, backtracking, and doubly nested line-search strategies can make the algorithm hyperparameter-free, yielding fast and reliable convergence across diverse data and problem structures (Goldstein et al., 2013, McManus et al., 21 Mar 2025).

7. Impact, Implementation Considerations, and Current Research Directions

PDHG is now fundamental in large-scale LP, where solvers like PDLP and cuPDLP leverage its communication patterns, convergence properties, and compatibility with hardware acceleration (GPU, RRAM) to surpass traditional barrier and simplex methods on modern workloads (Vo et al., 25 Sep 2025, Lu et al., 21 Jun 2025).
For highly-degenerate or nearly-degenerate LPs, current research focuses on refined geometric rates (two-phase analysis) and effective initialization/crossover strategies to reach vertex/basic solutions efficiently (Lu et al., 2023, Liu et al., 23 Sep 2024, Rothberg, 17 Nov 2025).
Active research directions include multi-Bregman and structure-exploiting splitting (e.g., TBDA), online adaptive scaling and regularization, hybrid deterministic/stochastic step selection, and preconditioned/fine-grained asynchronous implementations in both software (CUDA, Julia, Python) and hardware (crossbar arrays).

In summary, PDHG—through extensive algorithmic investigatons, geometric analysis, and hardware-conscious design—has established itself as a foundational method for large-scale convex-concave optimization, with clear pathways for extension to new application domains and architectures.