Papers
Topics
Authors
Recent
2000 character limit reached

Lyapunov Analysis of Nesterov's AG

Updated 23 November 2025
  • The paper demonstrates that Lyapunov analysis provides explicit contraction factors ensuring global R-linear convergence for Nesterov's Accelerated Gradient in μ-strongly convex, L-smooth settings.
  • It introduces a discrete Lyapunov sequence that tightly couples function error and state dynamics, enabling precise control over algorithmic contraction rates.
  • The analysis extends to accelerated proximal gradient methods and contrasts discrete NAG behavior with continuous ODE models, highlighting the need for high-resolution corrections.

Nesterov's Accelerated Gradient (NAG) method is a foundational extrapolation-based algorithm for convex optimization. Lyapunov analysis has emerged as the central tool for certifying both accelerated and linear convergence rates of NAG—including in circumstances where the strong convexity parameter is unknown. Such analyses systematically relate the decrease of structural “energy” sequences to convergence guarantees, revealing delicate distinctions between algorithmic, continuous-time, and high/low-resolution dynamical perspectives.

1. Algorithmic Framework and Extrapolation Parameters

Nesterov's method in the context of μ\mu-strongly convex and LL-smooth f:RnRf:\mathbb{R}^n\to\mathbb{R} operates through coupled updates:

  • xk+1=yksf(yk)x_{k+1} = y_k - s \nabla f(y_k),
  • βk+1=tk+11tk+2\beta_{k+1} = \frac{t_{k+1}-1}{t_{k+2}},
  • yk+1=xk+1+βk+1(xk+1xk)y_{k+1} = x_{k+1} + \beta_{k+1} (x_{k+1} - x_k),

where s(0,1/L]s\in(0,1/L] and tkt_k is a sequence satisfying t1=1t_1 = 1, tk+12tk+1tk2t_{k+1}^2 - t_{k+1} \leq t_k^2, tk+1>tkt_{k+1}>t_k, tkt_k\to\infty. Standard (classical) NAG chooses (tk)(t_k) independent of μ\mu, recovering optimal accelerated rates for convex objectives but leaving the case of strong convexity with μ\mu unknown unaddressed for Q- or R-linear convergence (Bao et al., 2023).

2. Lyapunov Construction in Discrete-time NAG

For any s<1/Ls < 1/L, define the discrete Lyapunov sequence:

  • Lk=Pk+MkL_k = P_k + M_k, where
  • Pk:=s(tk+11)tk+1(f(xk)f)P_k := s (t_{k+1}-1) t_{k+1} \big(f(x_k)-f^*\big) (potential component),
  • Mk:=12(tk+11)(ykxk)+(ykx)2M_k := \frac12 \| (t_{k+1}-1)(y_k-x_k) + (y_k-x^*) \|^2 (mixed/kinetic component).

This choice has the property Lk0L_k \to 0 if and only if f(xk)ff(x_k)\to f^* and ykxy_k\to x^*. It tightly links the function error and the coupled system state, and is specifically constructed to allow for sharp contraction estimates (Bao et al., 2023).

3. Linear (Q-linear) Contraction and Global R-linear Convergence

By leveraging LL-smoothness and μ\mu-strong convexity, the Lyapunov difference at iteration kk satisfies: Lk+1Lk12s2tk+12(1sL)f(yk)212μs(tk+11)tk+1ykxk212μstk+1ykx2.L_{k+1} - L_k \leq -\frac12 s^2 t_{k+1}^2 (1-sL)\|\nabla f(y_k)\|^2 - \frac12 \mu s (t_{k+1}-1)t_{k+1} \|y_k-x_k\|^2 - \frac12 \mu s t_{k+1} \|y_k-x^*\|^2. Simultaneous upper bounds and careful coefficient matching yield

LkCk(LkLk+1),Lk(Dk1)(LkLk+1)L_k \leq C_k (L_k - L_{k+1}), \quad L_k \leq (D_k-1)(L_k - L_{k+1})

for explicit, bounded sequences (Ck),(Dk)(C_k), (D_k), from which

Lk+1ρkLk,ρk=11min{Ck,Dk}<ρˉ<1,L_{k+1} \leq \rho_k L_k, \quad \rho_k = 1 - \frac{1}{\min\{C_k, D_k\}} < \bar\rho < 1,

with ρˉ\bar\rho depending only on L,μ,sL,\mu,s and founded on the explicit form: ρˉ1(1Ls)μs1+max{μ/L,1/8}.\bar{\rho} \leq 1 - \frac{(1-Ls)\mu s}{1+\max\{\mu/L,1/8\}}. This proves Q-linear contraction of LkL_k and, by the structure of PkP_k, yields global R-linear convergence: f(xk)f[i=0k1ρi]x0x22s(tk+11)tk+1.f(x_k) - f^* \leq \left[\prod_{i=0}^{k-1} \rho_i\right] \cdot \frac{\|x_0 - x^*\|^2}{2s (t_{k+1} - 1)t_{k+1}}. For s=1/Ls=1/L, the Lyapunov can be reduced to a classical two-term form Hk=λ(f(xk)f)+12xkxk12H_k = \lambda(f(x_k)-f^*) + \frac12 \|x_k-x_{k-1}\|^2, with λ\lambda chosen optimally, and geometric decrease Hk+1ρHkH_{k+1} \leq \rho H_k established, unrolling to f(xk)fρk(f(x0)f)f(x_k)-f^* \leq \rho^k(f(x_0)-f^*) (Bao et al., 2023).

4. Extension to Accelerated Proximal Gradient Methods

For composite objectives F(x)=f(x)+g(x)F(x)=f(x)+g(x), with gg closed and convex, define the proximal-gradient mapping Gs(y)=1s(yproxsg(ysf(y)))G_s(y) = \frac{1}{s}(y - \text{prox}_{sg}(y - s\nabla f(y))). The accelerated proximal gradient (APG) update replaces f(yk)\nabla f(y_k) with Gs(yk)G_s(y_k) in the same NAG iteration. The Lyapunov-based analysis extends identically: F(xk)Fi=0k1ρix0x22s(tk+11)tk+1,F(x_k) - F^* \leq \prod_{i=0}^{k-1}\rho_i \cdot \frac{\|x_0-x^*\|^2}{2 s (t_{k+1}-1) t_{k+1}}, for s<1/Ls<1/L, or F(xk)Fρk(F(x0)F)F(x_k) - F^* \leq \rho^k (F(x_0) - F^*) for s=1/Ls=1/L (Bao et al., 2023).

5. Continuous-Time ODE Comparison and High-Resolution Effects

The classical low-resolution ODE,

X¨(t)+3tX˙(t)+f(X(t))=0,\ddot{X}(t) + \frac{3}{t}\dot{X}(t) + \nabla f(X(t)) = 0,

yields only O(1/t2)O(1/t^2) sublinear convergence, even for quadratics, with no possibility of exponential decay; t3(f(X(t))f)↛0t^3(f(X(t))-f^*)\not\to0 as tt\to\infty. Thus, the discrete NAG method exhibits R-linear convergence without requiring μ\mu in (βk)(\beta_k), but this is impossible for the standard ODE limit (Bao et al., 2023). This discrepancy arises because the low-resolution ODE omits key O(s)O(\sqrt{s}) correction terms essential for exponential decay in discrete NAG.

High-resolution ODEs (Shi–Du–Jordan–Su) restore O(s)O(\sqrt{s})-scale terms, e.g.,

X¨(t)+2μX˙(t)+s2f(X(t))X˙(t)+f(X(t))=0,\ddot{X}(t) + 2\sqrt{\mu}\dot{X}(t) + \sqrt{s}\nabla^2 f(X(t))\dot{X}(t) + \nabla f(X(t)) = 0,

and locally, linear decay in objective is recovered, but global exponential decay remains unresolved in the continuous perspective (Bao et al., 2023).

6. Broader Significance: Uniform Q-Linear Theory with Unknown μ\mu

The Lyapunov-based contraction and convergence theorems conclusively demonstrate that standard NAG, with classical extrapolation parameters and μ\mu unknown, achieves global R-linear convergence on all μ\mu-strongly convex, LL-smooth ff (Bao et al., 2023). The result is robust under any allowed sequence (tk)(t_k) with t1=1t_1=1, tk+12tk+1tk2t_{k+1}^2-t_{k+1}\le t_k^2, tk+1>tkt_{k+1}>t_k, tkt_k\to\infty. For s<1/Ls<1/L, explicit and uniform bounds are provided for the contraction factors. This refines and expands the understanding of NAG's parameter robustness in practice and highlights a critical, previously unresolved, regime.

The analysis highlights the inapplicability of the continuous-time low-resolution ODE theory for capturing exponential-type contraction without O(s)O(\sqrt{s}) correction. The Lyapunov framework further enables direct extension of R-linear convergence to composite proximal settings (e.g., APG/FISTA), even when problem structure or parameter knowledge is incomplete.

7. Summary Table: Key Results

Property Low-Resolution ODE Discrete NAG (unknown μ\mu) High-Resolution ODE (local)
Convergence type O(1/t2)O(1/t^2) sublinear Global R-linear (Q-linear Lyapunov) Local linear decay
Parameter requirement μ\mu for exponential rate μ\mu-independent βk\beta_k (classical) O(s)O(\sqrt{s}) correction in ODE
Applicability to APG/FISTA No Yes (identical Lyapunov contraction) Local only

The crucial insight is that the Lyapunov discrete analysis, not the ODE continuous-time intuition, accurately predicts and certifies the true algorithmic exponential convergence regime for strongly convex Nesterov's methods without explicit knowledge of μ\mu (Bao et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Lyapunov Analysis of Nesterov's Accelerated Gradient Method.