Güler-Type Accelerated Proximal Gradient Method

Updated 25 November 2025

GPGM is an accelerated proximal gradient algorithm that exploits negative-norm terms to enable more aggressive extrapolation and tighter convergence constants.
It achieves an optimal O(1/k²) convergence rate with flexible parameter tuning (γ > L), outperforming classical schemes like FISTA in both theory and practice.
The method is well-suited for large-scale composite problems, with practical applications in L1-regularized regression and computational plasticity demonstrating enhanced efficiency.

The Güler-type Accelerated Proximal Gradient Method (GPGM) is an extrapolation-based first-order algorithm for composite minimization problems, specifically designed to solve objectives of the form $F(x)=f(x)+g(x)$ where $f$ is convex with Lipschitz continuous gradient and $g$ is proper, closed, and convex. GPGM, drawing on acceleration techniques originally introduced by Güler in 1992 and later classicized in the literature, modifies and generalizes Nesterov and Beck–Teboulle-style acceleration by exploiting negative-norm terms emerging in the convergence analysis. This enables more aggressive step extrapolations and offers fine-tuning capabilities not present in classical methods. The algorithm has rigorous theoretical foundation, achieves the optimal $O(1/k^2)$ complexity in smooth convex settings, and allows sharper constants in practice, as confirmed by both analysis and computational results in convex composite optimization and computational plasticity contexts (Zhou et al., 21 Nov 2025, Kanno, 2020).

1. Problem Formulation and Theoretical Foundations

GPGM is formulated for the composite minimization problem:

$\min_{x \in X} F(x) = f(x) + g(x)$

where $X$ is a closed convex subset of $\mathbb{R}^n$ , $f: \mathbb{R}^n \to \mathbb{R}$ is convex and $L$ -smooth (i.e., $\nabla f$ is Lipschitz continuous), and $g: \mathbb{R}^n \to (-\infty, +\infty]$ is proper, closed, convex, and such that proximal operators $\mathrm{prox}_{\tau g}$ are computationally tractable (Zhou et al., 21 Nov 2025). The key principle underlying acceleration is the use of extrapolated iterates that involve momentum terms, motivated by "estimate sequence" constructions inherent to Güler's and Nesterov's frameworks. GPGM further differentiates itself by incorporating negative-norm squared residuals retained in the recurrence relations, which classical schemes omit. This retention allows for a parameter $\gamma>L$ (implying an "aggressive" stepsize regime), leading to exponent-accelerated convergence guarantees in both theory and practice.

2. Algorithmic Structure and Extrapolation Mechanism

GPGM employs an adaptively parameterized extrapolation loop defined as follows (Zhou et al., 21 Nov 2025):

Extrapolation parameter update: For $k \geq 1$ ,

$t_k = \frac{1+\sqrt{1+4 t_{k-1}^2}}{2},\quad \theta_k = \frac{1}{t_k}, \quad t_0=0,\, t_1=1$

Middle point construction:

$x_m^k = (1-\theta_k)x_{\mathrm{ag}}^k + \theta_k \hat{x}^k$

where $x_{\mathrm{ag}}^k$ is the aggregated iterate and $\hat{x}^k$ is the extrapolated point.

Proximal gradient step:

$x^{k+1} = \arg\min_x \left\{g(x) + \frac{\tau_k}{2}\|x - (\hat{x}^k - \frac{1}{\tau_k}\nabla f(x_m^k))\|^2 \right\}$

with stepsize $\tau_k=\gamma\theta_k$ for free parameter $\gamma>L$ .

Extrapolated point update:

$\hat{x}^{k+1} = (\alpha-1)\hat{x}^k + (2-\alpha)x^{k+1}$

here $\alpha=L/\gamma \in (0,1]$ .

Aggregation:

$x_{\mathrm{ag}}^{k+1} = (1-\theta_k)x_{\mathrm{ag}}^k + \theta_k x^{k+1}$

The algorithm's design ensures the negative-norm term $-\| \hat{x}^k - x^{k+1} \|^2$ , arising in the estimate-sequence analysis, telescopes and strengthens objective descent (Zhou et al., 21 Nov 2025).

3. Convergence Properties and Theoretical Guarantees

GPGM achieves the optimal $O(1/k^2)$ objective gap rate:

$F(x_{\mathrm{ag}}^{k+1}) - F(x^*) \le \frac{2\gamma}{(2-\alpha)(k+1)^2}\|\hat{x}^1-x^*\|^2$

for any minimizer $x^*$ , where $\alpha=L/\gamma \in (0,1]$ (Zhou et al., 21 Nov 2025). Selection of $\gamma>L$ (i.e., $\alpha<1$ ) can lead to significantly reduced constants, as the negative term in the analysis tightens the master inequality from which convergence is derived.

The method reduces to FISTA or Tseng’s method [Tseng 2008] when $\gamma=L$ ( $\alpha=1$ ), in which case the negative-norm term vanishes (i.e., the method is not "Güler-type" in this limit). For strongly convex $f$ , related monotone accelerated variants guarantee asymptotic linear convergence $O(\rho^{-k})$ where $\rho>1$ is independent of explicit knowledge of the strong convexity parameter (Wang et al., 1 Jul 2025).

4. Distinguishing Features Versus Classical Accelerated Schemes

Classical APGM and FISTA do not explicitly retain the negative-norm term, which, as shown in GPGM, enhances both theoretical constants and empirical performance by enabling more "aggressive" extrapolation. Tseng’s variant is recovered for $\gamma=L$ , while GPGM's flexible $\gamma>L$ parameterization yields empirical and analytic improvements (Zhou et al., 21 Nov 2025). Monotone Güler-type accelerated proximal gradient variants offer strictly monotonic objective descent and guarantee $O(1/k^2)$ rates even when the step size is maximized at $1/L$ (or $1/(2L)$ for optimal constant) (Wang et al., 1 Jul 2025).

The following table organizes the main differences between these schemes:

Method	Extrapolation Parameter	Negative-Norm Term	Aggressiveness (γ)
FISTA/Tseng	Classical (α=1)	Not retained	Fixed, γ=L
GPGM	Flexible (α=L/γ <1)	Explicitly used	Tunable, γ>L
Monotone Güler	α≥3 (see (Wang et al., 1 Jul 2025))	Implicit in Lyapunov	Tunable, s≤1/L

5. Implementation, Computational Aspects, and Practical Guidance

Each GPGM iteration consists of a gradient computation at the "middle" point $x_m^k$ , a proximal step with respect to $g$ , and simple vector updates. The negative-norm term does not introduce computational overhead, as it is managed analytically. In specialized contexts such as elastoplastic analysis, the GPGM structure encompasses blockwise updates—for instance, plastic strain updates via pointwise proximal projections and displacement updates using momentum (Kanno, 2020). The method is particularly attractive for large-scale problems where the computation of second-order information is prohibitive.

Empirical studies in $\ell_1$ -regularized logistic regression and other large-scale problems indicate that the tighter upper bounds and the ability to select $\alpha<1$ lead to faster residual decay and solution accuracy than classic APGM/FISTA (Zhou et al., 21 Nov 2025). Step size should be set as large as convergence theory allows, with $\gamma$ marginally larger than $L$ to optimize constants.

The Güler-type acceleration paradigm extends to other operator-splitting and augmented Lagrangian-type methods. The Güler-type accelerated linearized augmented Lagrangian method (GLALM) and the Güler-type accelerated linearized ADMM (GLADMM) generalize the basic principles for problems with saddle-point structure and constrained composites, enabling simultaneous momentum in primal and dual variables. These extensions preserve or improve convergence rates relative to classical non-accelerated counterparts, achieving $O(1/k^2)$ rates in certain subproblems, and more aggressive partial convergence rates in GLADMM (Zhou et al., 21 Nov 2025). A plausible implication is that the negative-term-based extrapolation mechanism can be systematically applied to other first-order and splitting frameworks.

7. Applications and Numerical Evidence

GPGM is especially suited to applications involving high-dimensional, structured sparsity or regularized learning, such as $\ell_1$ -regularized logistic regression and compressive sensing. In computational plasticity, GPGM (and related momentum schemes) provide efficient algorithms for incremental variational problems involving nonsmooth yield constraints, significantly reducing wall-clock time and iteration count relative to second-order or non-accelerated methods (Kanno, 2020, Zhou et al., 21 Nov 2025). Numerical experiments confirm the practical advantage of the GPGM framework, demonstrating consistent gains in objective decrease, iteration count, and solution accuracy over conventional APGM/FISTA across diverse problem scales.

References

"A Note on a Family of Proximal Gradient Methods for Quasi-static Incremental Problems in Elastoplastic Analysis" (Kanno, 2020)
"Convergence Rate Analysis for Monotone Accelerated Proximal Gradient Method" (Wang et al., 1 Jul 2025)
"The Güler-type acceleration for proximal gradient, linearized augmented Lagrangian and linearized alternating direction method of multipliers" (Zhou et al., 21 Nov 2025)