Proximal Subgradient Algorithm

Updated 7 January 2026

Proximal Subgradient Algorithm is an iterative method that blends subgradient steps with proximal or projection operations to minimize nonsmooth and nonconvex composite functions.
It adapts to a variety of problem settings including deterministic, stochastic, and inexact schemes while carefully controlling step sizes and error tolerances.
Convergence guarantees range from sublinear rates in convex cases to stationarity under weakly convex and nonconvex conditions, with extensions for DC and fractional programming.

A proximal subgradient algorithm is an iterative first-order method for minimizing nonsmooth, possibly nonconvex or non-Lipschitz objective functions, in which each iteration combines a subgradient step (for nonsmoothness) with a proximal or projected step (for handling constraint sets or non-smooth penalty terms). This class encompasses a range of algorithms, from classical subgradient and projected subgradient schemes to modern stochastic and composite splitting methods adapted to weakly convex, difference-of-convex (DC), or fractional settings.

1. Problem Classes and Foundational Principles

Proximal subgradient algorithms target broad problem families, most notably nonsmooth and weakly convex composite objectives and expectation-valued or stochastic optimization. The prototypical problem is: $\min_{x\in X} ~ F(x) := f(x) + g(x)$ where $f$ is potentially nonsmooth, nonconvex, or only weakly convex, $g$ may be nonsmooth or serve as the indicator of a closed convex set (encoding constraints), and $X\subset\mathbb{R}^d$ is a closed convex set, possibly the full space. Variants consider min-max, multiobjective, and fractional programs, DC-structured objectives, and infinite-dimensional Hilbert spaces (Davis et al., 2017, Zhu et al., 2023, Dai et al., 2024, Cruz, 2014, Wei et al., 2013, Tuyen et al., 31 Dec 2025, Qi et al., 22 Oct 2025, Bednarczuk et al., 2024).

A key property is weak convexity, which generalizes convexity for nonsmooth/nonconvex objectives. A function $F$ is $\rho$ -weakly convex if $x \mapsto F(x) + \frac{\rho}{2}\|x\|^2$ is convex (Davis et al., 2017, Zhu et al., 2023). This property enables defining and analyzing the proximal map and the Moreau envelope even in the absence of convexity or smoothness.

Proximal subgradient methods appeal in these regimes because:

Subgradients are available for nonsmooth functions where gradients are not.
The proximal step regularizes the iteration, controls the search step for strong or weak convexity, and can handle complex constraints and composite structures.
Proximal and projected steps can often be computed efficiently for a wide class of convex $g$ or constraint sets.

2. Core Algorithmic Schemes

Several canonical forms arise, adapted for specific problem structures.

2.1 Classical Proximal Subgradient Splitting

For convex $f, g$ : $x^{k+1} = \mathrm{prox}_{\alpha_k g} \big(x^k - \alpha_k u^k\big), \qquad u^k \in \partial f(x^k)$ with step size $\alpha_k>0$ and $\mathrm{prox}_{g}$ the proximal operator of $g$ . This includes projected subgradient (if $g$ is an indicator), and reduces to proximal point iteration for $f=0$ (Cruz, 2014).

2.2 Inexact and Stochastic Variants

For expectation minimization with only stochastic subgradients available, the Proximally Guided Stochastic Subgradient method (PGSG) (Davis et al., 2017) implements an inexact proximal iteration: $x_{t+1} \approx \arg\min_{x\in X} \Bigl\{F(x) + \frac{1}{2\gamma}\|x-x_t\|^2\Bigr\}$ where each proximal subproblem is solved approximately (inner loop) using a stochastic projected subgradient method. This yields an outer loop over proximal iterates and an inner loop over subgradient steps for the strongly convex local model.

For stochastic saddle point/min-max problems, SAPS employs a proximal subgradient map in primal-dual coordinates, using unbiased, variance-bounded stochastic subgradients and achieves $\mathcal{O}(N^{-1/2})$ expected convergence for the optimality gap (Dai et al., 2024).

2.3 Extrapolation and Acceleration

Incorporating extrapolation or Nesterov-type momentum yields inertial or accelerated versions. These variants provide accelerated rates for squared subgradient norms, e.g., $\mathcal{O}(1/k^2)$ for ISTA and $\mathcal{O}(1/k^3)$ for FISTA (Li et al., 2022). Extrapolation is also crucial for nonconvex, non-Lipschitz schemes where global smoothness cannot be assumed (Pham et al., 2022, Yang et al., 27 Nov 2025).

2.4 Composite and Multiobjective Extensions

For DC objectives, multiobjective, and fractional optimization, the proximal subgradient algorithm constructs surrogates based on linearization and weak convexity, with subgradient steps for the nonsmooth (possibly nonconvex) part, and an explicit proximal or projected step (Tuyen et al., 31 Dec 2025, Qi et al., 22 Oct 2025, Han et al., 15 Mar 2025, Boţ et al., 2020, Bednarczuk et al., 2024).

3. Convergence Guarantees and Complexity

Convergence theory for these algorithms bifurcates by problem structure and the exactness of subproblems solved.

3.1 Deterministic, Convex, and Weakly Convex Cases

Under convexity and bounded subgradients, subgradient splitting achieves sublinear rates for objective gap— $\mathcal{O}(1/\sqrt{k})$ best-iterate for fixed step sizes, $\mathcal{O}(1/k)$ with Polyak-type rules (Cruz, 2014, Millán et al., 2018). For weakly convex (possibly nonconvex) settings and suitable step sizes,

$\min_{0\leq j \leq T} \|\nabla e_\lambda(x_j)\|^2 = \mathcal{O}\left(1/\sqrt{T}\right)$

where $e_\lambda$ is the Moreau envelope of the objective (Zhu et al., 2023). Under a uniform Kurdyka–Łojasiewicz property (KL exponent $1/2$), this rate improves to $\mathcal{O}(1/T)$ .

3.2 Stochastic and Inexact Schemes

For expectation minimization via PGSG, the expected squared stationarity gap satisfies

$\mathbb{E}[\mathrm{dist}(0,\partial F(\hat x_R))^2] \leq \frac{\epsilon}{\gamma^2}$

with overall oracle complexity $\mathcal{O}(\epsilon^{-2})$ , matching smooth nonconvex SGD (Davis et al., 2017). For stochastic min-max, the SAPS algorithm yields both expectation and high-probability bounds on the minimax optimality gap, with $\mathcal{O}(N^{-1/2})$ scaling (Dai et al., 2024).

3.3 Nonconvex, DC, and Fractional Problems

In nonconvex or DC settings, only subsequential convergence to stationary points (for the limiting or Mordukhovich subdifferential) is guaranteed under mild assumptions (level boundedness, step sizes), with full-sequence convergence under a KL property (Tuyen et al., 31 Dec 2025, Pham et al., 2022, Wei et al., 2013). For single-ratio fractional programs, monotonic decay of the objective and convergence to critical points are established, with closedness at isolated local minimizers (Qi et al., 22 Oct 2025, Han et al., 15 Mar 2025, Boţ et al., 2020, Yang et al., 15 Apr 2025).

4. Algorithmic Components and Step Size Selection

The effective behavior of proximal subgradient algorithms depends critically on the choices of:

Subgradient oracle: deterministic, stochastic, or approximate.
Step size $\alpha_k$ , $\gamma$ : constant, diminishing, Polyak-based, or adapted by line search for non-Lipschitz scenarios (Cruz, 2014, Yang et al., 27 Nov 2025).
Inexactness and error control: Absolute or relative error criteria for inexact prox-evaluations are supported, and error summability is pivotal for convergence in inexact schemes (Millán et al., 2018, Yang et al., 15 Apr 2025).
Acceleration parameters: inertial or extrapolation constants, possibly updated via Nesterov/FISTA sequences, with safeguards (periodic resets, restarts) to prevent instability (Li et al., 2022, Pham et al., 2022).
Variance reduction and robustification: outer-averaging, two-phase variants, or weighted iterates to mitigate stochastic variability in nonsmooth, nonconvex settings (Davis et al., 2017).

5. Applications and Model Coverage

Proximal subgradient algorithms are now foundational in a spectrum of settings:

Nonsmooth, nonconvex learning: phase retrieval, robust statistics, covariance/trimmed estimation, nonconvex dictionary learning, and robust PCA (Davis et al., 2017).
Sparse signal recovery and imaging: fractional and scale-invariant regularization, sparse regression ( $\ell_1/\ell_2$ ), CT reconstruction, and generalized graph Fourier decomposition (Han et al., 15 Mar 2025, Qi et al., 22 Oct 2025, Yang et al., 15 Apr 2025, Boţ et al., 2020).
Stochastic convex-concave minimax: learning with constraints and multi-player games, conic duality, Neyman–Pearson classification (Dai et al., 2024).
Multiobjective DC optimization: joint minimization of several DC components subject to constraints, with applications in finance, engineering design, and computational physics (Tuyen et al., 31 Dec 2025).
Deep model selection and large-scale learning: where exact (sub)gradient computation is intractable, and only projections, (sub)gradients, or stochastic oracles are available (Zhu et al., 2023).

6. Extensions, Generalizations, and Theoretical Impact

Several recent developments extend the classical proximal subgradient paradigm:

Abstract convexity frameworks: Defining subgradients and proximal maps for broader classes of functionals, yielding corresponding algorithmic structures and convergence proofs (Bednarczuk et al., 2024).
Variable metric and inexactness: Adaptive scaling matrices and variable metric prox-terms lead to more efficient methods for poorly conditioned problems, with careful error tolerance analysis (Yang et al., 15 Apr 2025).
Line search and nonmonotone control: When global Lipschitz constants are unavailable, carefully crafted nonmonotone line search procedures allow parameter-free convergence guarantees for both monotone and accelerated variants (Yang et al., 27 Nov 2025).
Fractional and DC programming: Specialized proximal subgradient schemes are tailored for single-ratio, structured, and composite fractional programs, enabling global or local convergence under minimal smoothness/data assumptions (Qi et al., 22 Oct 2025, Boţ et al., 2020, Han et al., 15 Mar 2025).
Stochastic minimax and conic settings: SAPS and its LSAAL extension handle stochastic convex-concave problems even with unbounded or light-tailed stochastic subgradients (Dai et al., 2024).

By leveraging the Moreau envelope, error bounds, and KL theory, contemporary proximal subgradient algorithms retain provable convergence and complexity even in the presence of nonconvexity, non-Lipschitzness, and stochasticity—significantly broadening their practical range (Zhu et al., 2023, Davis et al., 2017, Dai et al., 2024).

7. Summary Table: Algorithmic Landscape

Class / Property	Notable Algorithm(s)	Main Convergence Rate / Guarantee
Convex, deterministic	PSS, PSG, abstract PSG, NSDM	$\mathcal{O}(1/\sqrt{k})$ or $\mathcal{O}(1/k)$ w/Polyak
Weakly convex, nonconvex	PGSG, Prox-SubGrad, DC PSG	$\mathcal{O}(1/\sqrt{k})$ for prox-stationarity; KL: $\mathcal{O}(1/k)$
Stochastic settings	PGSG, SAPS, Stochastic Prox-SubGrad	$\mathcal{O}(1/\sqrt{N})$ (expected or high-probability)
Inexact / variable metric	PeSM (abs/rel), iVPGSA	Critical point convergence under error-summability
Accelerated/extrapolated	ISTA/FISTA (prox subgrad norm), e-PSG	$\mathcal{O}(1/k^2)$ , $\mathcal{O}(1/k^3)$ (prox gradient norm)
Line-search/non-Lipschitz	nexPGA, FISTA w/line-search	Global convergence (KL) under local smoothness
Fractional/DC/multiobjective	PSA, FPSA, DC PSG, BDSA, multiobjective PSG	Subsequence/cluster-point stationarity, KL: full sequence