Proximal-PL Condition

Updated 27 December 2025

Proximal-PL condition is a regularity criterion for composite optimization that replaces the classical gradient-norm bound with a computable proximal-gradient residual.
It characterizes non-strongly convex and nonsmooth objectives, implying quadratic growth and error bounds even under weak convexity.
The condition underpins the convergence of proximal methods, including stochastic and variance-reduced variants, in modern optimization applications.

The Proximal-PL (Proximal Polyak–Łojasiewicz) Condition is a regularity criterion for composite optimization problems that generalizes the classical Polyak–Łojasiewicz (PL) inequality to settings with nonsmooth convex terms. It characterizes a broad class of non-strongly convex (and possibly nonconvex) objectives for which proximal gradient methods and their stochastic or variance-reduced counterparts enjoy global linear convergence rates. The Proximal-PL condition replaces the classical gradient-norm lower bound with a computable proximal-gradient residual or a model-based functional decrement, and plays a central role in contemporary convergence theory for first-order algorithms used in large-scale nonsmooth and weakly convex optimization.

1. Formal Definition of the Proximal-PL Condition

Let $F:\mathbb{R}^d \to \mathbb{R}\cup\{+\infty\}$ be a composite function

$F(x) = f(x) + h(x),$

where $f \in C^1$ has an $L$ -Lipschitz gradient and $h$ is closed, proper, and convex. For a step-size $\gamma > 0$ , define the proximal-gradient mapping (residual)

$\mathcal{G}_\gamma(x) := \frac{1}{\gamma}\left(x - \operatorname{prox}_{\gamma h}(x - \gamma \nabla f(x))\right).$

The Proximal-PL condition with constant $\mu > 0$ is

$\forall x \in \mathbb{R}^d,\quad \|\mathcal{G}_\gamma(x)\|^2 \ge 2\mu \bigl(F(x) - F_\ast\bigr),$

where $F_\ast = \min_x F(x)$ . This version is variously called the residual Polyak–Łojasiewicz (RPL) or gradient mapping PL inequality (Kong et al., 18 Nov 2024, Li et al., 2018).

Alternative formulations exist using (i) model decrease based on a proximal quadratic model (Kim et al., 2021), or (ii) the squared distance from zero to the subdifferential for weakly convex/nonsmooth cases (Liao et al., 2023),

$\operatorname{dist}^2(0, \partial F(x)) \ge \mu (F(x) - F^*)$

on a (possibly restricted) sublevel set.

2. Relationship to Classical PL and Other Regularity Conditions

The classical PL inequality for smooth $f$ is

$\|\nabla f(x)\|^2 \ge 2\mu [f(x) - f^*].$

For general composite $F(x)=f(x)+h(x)$ , the direct analogue,

$\operatorname{dist}^2(0, \partial F(x)) \ge 2\mu [F(x) - F^*],$

may be impractical due to the difficulty of computing general subgradients. The Proximal-PL condition replaces the gradient or subgradient norm with a proximal-gradient residual, which is directly computable from the proximal step.

In the smooth case ( $h \equiv 0$ or $h(x)$ constant), Proximal-PL is equivalent to the PL condition. Strong convexity implies Proximal-PL. For weakly convex $f$ or nonsmooth $h$ , Proximal-PL is strictly weaker than strong convexity but still implies quadratic growth and error bounds (Karimi et al., 2016, Liao et al., 2023).

Within the hierarchy of regularity conditions:

Strong convexity ⇒ Restricted Secant Inequality (RSI) ⇒ Subdifferential Error Bound (EB) ⇔ PL/Proximal-PL ⇒ Quadratic Growth (QG).
For convex $f$ , RSI ⇔ EB ⇔ Proximal-PL ⇔ QG (Liao et al., 2023).

3. Proximal Gradient Methods Under the Proximal-PL Condition

The proximal gradient method (PGM) iterates

$x_{k+1} = \operatorname{prox}_{\gamma h}(x_k - \gamma \nabla f(x_k)).$

Under the Proximal-PL condition with $\mu > 0$ , $f$ $L$ -smooth, $h$ convex and prox-capable, and proper step-size $\gamma \in (0,2/L)$ , PGM enjoys global linear convergence in function value.

Explicit rate for convex $f$ (Kong et al., 18 Nov 2024): $F(x_{k+1}) - F_* \le \tau(\gamma) [F(x_k) - F_*],$ where $\tau(\gamma)$ is a contraction factor, piecewise defined depending on $\gamma$ :

$(0, 1/L]$ : $\tau(\gamma)=(1-\gamma\mu)/(1+\gamma\mu)$
$(1/L, 3/(2L)]$ : rational expression as given in (Kong et al., 18 Nov 2024)
$(3/(2L), 2/L)$ : another explicit rational form.

For possibly nonconvex $f$ : $F(x_{k+1}) - F_* \le \frac{L + \mu(L\gamma-1)^2 - \mu}{L} [F(x_k) - F_*]$ (Kong et al., 18 Nov 2024).

More generally, for any method whose progress can be lower bounded by the residual or the model-based decrement, Proximal-PL gives a linear contraction with rate parameter depending on $L$ , $\mu$ , and $\gamma$ (Karimi et al., 2016).

4. Stochastic and Variance-Reduced Proximal Algorithms

Variance-reduced methods such as ProxSVRG+, ProxSAGA, and their loopless variants inherit linear convergence rates under the Proximal-PL condition (Li et al., 2018, Traoré et al., 2023). The method ProxSVRG+ as proposed in (Li et al., 2018) employs a variance-reduced stochastic gradient estimate within a proximal subroutine, and under Proximal-PL achieves global geometric decay in the objective gap: $\Phi(x^S) - \Phi^* \leq \left(1-\frac{\mu}{C_1 L}\right)^{Sm} \left(\Phi(x^0) - \Phi^*\right) + \frac{C_2 \sigma^2/n}{\mu}$ for suitable constants and parameter settings, without requiring restarts. This "automatic switch" from sublinear to linear regime occurs as soon as the iterates enter a region where the Proximal-PL holds. Analogous contraction results hold for ProxSAGA and other variance-reduced algorithms (Traoré et al., 2023).

In summary, Proximal-PL underpins the linear convergence of a wide range of stochastic first-order methods for both convex and nonconvex-nonsmooth problems.

5. Performance Estimation and Rate Tightness

Performance estimation problems (PEPs) have been systematically used to derive the sharpest possible contraction rates for proximal gradient and related methods under Proximal-PL/RPL. The approach involves forming nonnegative linear combinations of inequalities derived from $L$ -smoothness, convexity/subdifferential properties, and the Proximal-PL, constructing an explicit Lyapunov potential that telescopes with the algorithmic steps (Kong et al., 18 Nov 2024). The resulting contraction factors $\tau(\gamma)$ are provably optimal within the class of $L$ -smooth/ $h$ -convex objectives.

Comparison with earlier bounds shows that the PEP-derived rates uniformly improve or subsume those based on more restrictive conditions (e.g., strong convexity or classical PL for the smooth case) (Kong et al., 18 Nov 2024).

6. Applications and Canonical Examples

Numerous problems in machine learning and signal processing are known to satisfy the Proximal-PL condition:

$\ell_1$ -regularized least squares: $F(x) = \tfrac{1}{2}\|Ax-b\|^2 + \lambda\|x\|_1$ with $A$ of full rank
Logistic regression with $\ell_1$ penalty on compact sublevels
Dual of SVMs: quadratic growth and thus Proximal-PL in the dual space
Group-lasso, nuclear norm minimization, sparse group lasso under suitable data assumptions

In the weak convexity regime, Proximal-PL applies on restricted sublevels, enabling theory for algorithms with inexact or approximate proximal steps (Liao et al., 2023). In the min-max and distributionally robust setting, a Moreau-regularized primal (via proximal term) enables global PL-type inequalities and thus linear convergence in both objective and duality gaps (Guo et al., 2020).

7. Extensions: Beyond Deterministic Convexity and Robustness to Noise

The Proximal-PL framework extends to settings with:

Weakly convex objectives (not globally convex), via subdifferential-based or error-bound Proximal-PL (restricted to sublevels) (Liao et al., 2023)
Online/stochastic settings with unbiased and even sub-Weibull gradient noise, where Proximal-PL ensures linear decay up to a bias term determined by the noise (Kim et al., 2021)
Nonconvex–strongly-concave min-max, by moving to a Moreau-regularized primal and establishing a PL inequality on that regularized function (Guo et al., 2020)

A plausible implication is that the Proximal-PL condition, in its various forms, provides a unifying thread for establishing linear convergence of first-order algorithms in composite, stochastic, online, weakly convex, and saddle-point optimization. Its computable and verifiable character is particularly advantageous for nonsmooth or structured regularized problems where standard smooth analysis fails.