Proximal Subgradient Algorithm
- Proximal Subgradient Algorithm is an iterative method that blends subgradient steps with proximal or projection operations to minimize nonsmooth and nonconvex composite functions.
- It adapts to a variety of problem settings including deterministic, stochastic, and inexact schemes while carefully controlling step sizes and error tolerances.
- Convergence guarantees range from sublinear rates in convex cases to stationarity under weakly convex and nonconvex conditions, with extensions for DC and fractional programming.
A proximal subgradient algorithm is an iterative first-order method for minimizing nonsmooth, possibly nonconvex or non-Lipschitz objective functions, in which each iteration combines a subgradient step (for nonsmoothness) with a proximal or projected step (for handling constraint sets or non-smooth penalty terms). This class encompasses a range of algorithms, from classical subgradient and projected subgradient schemes to modern stochastic and composite splitting methods adapted to weakly convex, difference-of-convex (DC), or fractional settings.
1. Problem Classes and Foundational Principles
Proximal subgradient algorithms target broad problem families, most notably nonsmooth and weakly convex composite objectives and expectation-valued or stochastic optimization. The prototypical problem is: where is potentially nonsmooth, nonconvex, or only weakly convex, may be nonsmooth or serve as the indicator of a closed convex set (encoding constraints), and is a closed convex set, possibly the full space. Variants consider min-max, multiobjective, and fractional programs, DC-structured objectives, and infinite-dimensional Hilbert spaces (Davis et al., 2017, Zhu et al., 2023, Dai et al., 2024, Cruz, 2014, Wei et al., 2013, Tuyen et al., 31 Dec 2025, Qi et al., 22 Oct 2025, Bednarczuk et al., 2024).
A key property is weak convexity, which generalizes convexity for nonsmooth/nonconvex objectives. A function is -weakly convex if is convex (Davis et al., 2017, Zhu et al., 2023). This property enables defining and analyzing the proximal map and the Moreau envelope even in the absence of convexity or smoothness.
Proximal subgradient methods appeal in these regimes because:
- Subgradients are available for nonsmooth functions where gradients are not.
- The proximal step regularizes the iteration, controls the search step for strong or weak convexity, and can handle complex constraints and composite structures.
- Proximal and projected steps can often be computed efficiently for a wide class of convex or constraint sets.
2. Core Algorithmic Schemes
Several canonical forms arise, adapted for specific problem structures.
2.1 Classical Proximal Subgradient Splitting
For convex : with step size and the proximal operator of . This includes projected subgradient (if is an indicator), and reduces to proximal point iteration for (Cruz, 2014).
2.2 Inexact and Stochastic Variants
For expectation minimization with only stochastic subgradients available, the Proximally Guided Stochastic Subgradient method (PGSG) (Davis et al., 2017) implements an inexact proximal iteration: where each proximal subproblem is solved approximately (inner loop) using a stochastic projected subgradient method. This yields an outer loop over proximal iterates and an inner loop over subgradient steps for the strongly convex local model.
For stochastic saddle point/min-max problems, SAPS employs a proximal subgradient map in primal-dual coordinates, using unbiased, variance-bounded stochastic subgradients and achieves expected convergence for the optimality gap (Dai et al., 2024).
2.3 Extrapolation and Acceleration
Incorporating extrapolation or Nesterov-type momentum yields inertial or accelerated versions. These variants provide accelerated rates for squared subgradient norms, e.g., for ISTA and for FISTA (Li et al., 2022). Extrapolation is also crucial for nonconvex, non-Lipschitz schemes where global smoothness cannot be assumed (Pham et al., 2022, Yang et al., 27 Nov 2025).
2.4 Composite and Multiobjective Extensions
For DC objectives, multiobjective, and fractional optimization, the proximal subgradient algorithm constructs surrogates based on linearization and weak convexity, with subgradient steps for the nonsmooth (possibly nonconvex) part, and an explicit proximal or projected step (Tuyen et al., 31 Dec 2025, Qi et al., 22 Oct 2025, Han et al., 15 Mar 2025, Boţ et al., 2020, Bednarczuk et al., 2024).
3. Convergence Guarantees and Complexity
Convergence theory for these algorithms bifurcates by problem structure and the exactness of subproblems solved.
3.1 Deterministic, Convex, and Weakly Convex Cases
Under convexity and bounded subgradients, subgradient splitting achieves sublinear rates for objective gap— best-iterate for fixed step sizes, with Polyak-type rules (Cruz, 2014, Millán et al., 2018). For weakly convex (possibly nonconvex) settings and suitable step sizes,
where is the Moreau envelope of the objective (Zhu et al., 2023). Under a uniform Kurdyka–Łojasiewicz property (KL exponent $1/2$), this rate improves to .
3.2 Stochastic and Inexact Schemes
For expectation minimization via PGSG, the expected squared stationarity gap satisfies
with overall oracle complexity , matching smooth nonconvex SGD (Davis et al., 2017). For stochastic min-max, the SAPS algorithm yields both expectation and high-probability bounds on the minimax optimality gap, with scaling (Dai et al., 2024).
3.3 Nonconvex, DC, and Fractional Problems
In nonconvex or DC settings, only subsequential convergence to stationary points (for the limiting or Mordukhovich subdifferential) is guaranteed under mild assumptions (level boundedness, step sizes), with full-sequence convergence under a KL property (Tuyen et al., 31 Dec 2025, Pham et al., 2022, Wei et al., 2013). For single-ratio fractional programs, monotonic decay of the objective and convergence to critical points are established, with closedness at isolated local minimizers (Qi et al., 22 Oct 2025, Han et al., 15 Mar 2025, Boţ et al., 2020, Yang et al., 15 Apr 2025).
4. Algorithmic Components and Step Size Selection
The effective behavior of proximal subgradient algorithms depends critically on the choices of:
- Subgradient oracle: deterministic, stochastic, or approximate.
- Step size , : constant, diminishing, Polyak-based, or adapted by line search for non-Lipschitz scenarios (Cruz, 2014, Yang et al., 27 Nov 2025).
- Inexactness and error control: Absolute or relative error criteria for inexact prox-evaluations are supported, and error summability is pivotal for convergence in inexact schemes (Millán et al., 2018, Yang et al., 15 Apr 2025).
- Acceleration parameters: inertial or extrapolation constants, possibly updated via Nesterov/FISTA sequences, with safeguards (periodic resets, restarts) to prevent instability (Li et al., 2022, Pham et al., 2022).
- Variance reduction and robustification: outer-averaging, two-phase variants, or weighted iterates to mitigate stochastic variability in nonsmooth, nonconvex settings (Davis et al., 2017).
5. Applications and Model Coverage
Proximal subgradient algorithms are now foundational in a spectrum of settings:
- Nonsmooth, nonconvex learning: phase retrieval, robust statistics, covariance/trimmed estimation, nonconvex dictionary learning, and robust PCA (Davis et al., 2017).
- Sparse signal recovery and imaging: fractional and scale-invariant regularization, sparse regression (), CT reconstruction, and generalized graph Fourier decomposition (Han et al., 15 Mar 2025, Qi et al., 22 Oct 2025, Yang et al., 15 Apr 2025, Boţ et al., 2020).
- Stochastic convex-concave minimax: learning with constraints and multi-player games, conic duality, Neyman–Pearson classification (Dai et al., 2024).
- Multiobjective DC optimization: joint minimization of several DC components subject to constraints, with applications in finance, engineering design, and computational physics (Tuyen et al., 31 Dec 2025).
- Deep model selection and large-scale learning: where exact (sub)gradient computation is intractable, and only projections, (sub)gradients, or stochastic oracles are available (Zhu et al., 2023).
6. Extensions, Generalizations, and Theoretical Impact
Several recent developments extend the classical proximal subgradient paradigm:
- Abstract convexity frameworks: Defining subgradients and proximal maps for broader classes of functionals, yielding corresponding algorithmic structures and convergence proofs (Bednarczuk et al., 2024).
- Variable metric and inexactness: Adaptive scaling matrices and variable metric prox-terms lead to more efficient methods for poorly conditioned problems, with careful error tolerance analysis (Yang et al., 15 Apr 2025).
- Line search and nonmonotone control: When global Lipschitz constants are unavailable, carefully crafted nonmonotone line search procedures allow parameter-free convergence guarantees for both monotone and accelerated variants (Yang et al., 27 Nov 2025).
- Fractional and DC programming: Specialized proximal subgradient schemes are tailored for single-ratio, structured, and composite fractional programs, enabling global or local convergence under minimal smoothness/data assumptions (Qi et al., 22 Oct 2025, Boţ et al., 2020, Han et al., 15 Mar 2025).
- Stochastic minimax and conic settings: SAPS and its LSAAL extension handle stochastic convex-concave problems even with unbounded or light-tailed stochastic subgradients (Dai et al., 2024).
By leveraging the Moreau envelope, error bounds, and KL theory, contemporary proximal subgradient algorithms retain provable convergence and complexity even in the presence of nonconvexity, non-Lipschitzness, and stochasticity—significantly broadening their practical range (Zhu et al., 2023, Davis et al., 2017, Dai et al., 2024).
7. Summary Table: Algorithmic Landscape
| Class / Property | Notable Algorithm(s) | Main Convergence Rate / Guarantee |
|---|---|---|
| Convex, deterministic | PSS, PSG, abstract PSG, NSDM | or w/Polyak |
| Weakly convex, nonconvex | PGSG, Prox-SubGrad, DC PSG | for prox-stationarity; KL: |
| Stochastic settings | PGSG, SAPS, Stochastic Prox-SubGrad | (expected or high-probability) |
| Inexact / variable metric | PeSM (abs/rel), iVPGSA | Critical point convergence under error-summability |
| Accelerated/extrapolated | ISTA/FISTA (prox subgrad norm), e-PSG | , (prox gradient norm) |
| Line-search/non-Lipschitz | nexPGA, FISTA w/line-search | Global convergence (KL) under local smoothness |
| Fractional/DC/multiobjective | PSA, FPSA, DC PSG, BDSA, multiobjective PSG | Subsequence/cluster-point stationarity, KL: full sequence |
All claims, formalisms, and models directly track the referenced research publications (Davis et al., 2017, Cruz, 2014, Zhu et al., 2023, Pham et al., 2022, Yang et al., 27 Nov 2025, Dai et al., 2024, Li et al., 2022, Boţ et al., 2020, Wei et al., 2013, Millán et al., 2018, Bednarczuk et al., 2024, Qi et al., 22 Oct 2025, Han et al., 15 Mar 2025, Tuyen et al., 31 Dec 2025, Yang et al., 15 Apr 2025).