Papers
Topics
Authors
Recent
Search
2000 character limit reached

Lyapunov Analysis of Nesterov's AG

Updated 23 November 2025
  • The paper demonstrates that Lyapunov analysis provides explicit contraction factors ensuring global R-linear convergence for Nesterov's Accelerated Gradient in μ-strongly convex, L-smooth settings.
  • It introduces a discrete Lyapunov sequence that tightly couples function error and state dynamics, enabling precise control over algorithmic contraction rates.
  • The analysis extends to accelerated proximal gradient methods and contrasts discrete NAG behavior with continuous ODE models, highlighting the need for high-resolution corrections.

Nesterov's Accelerated Gradient (NAG) method is a foundational extrapolation-based algorithm for convex optimization. Lyapunov analysis has emerged as the central tool for certifying both accelerated and linear convergence rates of NAG—including in circumstances where the strong convexity parameter is unknown. Such analyses systematically relate the decrease of structural “energy” sequences to convergence guarantees, revealing delicate distinctions between algorithmic, continuous-time, and high/low-resolution dynamical perspectives.

1. Algorithmic Framework and Extrapolation Parameters

Nesterov's method in the context of μ\mu-strongly convex and LL-smooth f:RnRf:\mathbb{R}^n\to\mathbb{R} operates through coupled updates:

  • xk+1=yksf(yk)x_{k+1} = y_k - s \nabla f(y_k),
  • βk+1=tk+11tk+2\beta_{k+1} = \frac{t_{k+1}-1}{t_{k+2}},
  • yk+1=xk+1+βk+1(xk+1xk)y_{k+1} = x_{k+1} + \beta_{k+1} (x_{k+1} - x_k),

where s(0,1/L]s\in(0,1/L] and tkt_k is a sequence satisfying t1=1t_1 = 1, tk+12tk+1tk2t_{k+1}^2 - t_{k+1} \leq t_k^2, LL0, LL1. Standard (classical) NAG chooses LL2 independent of LL3, recovering optimal accelerated rates for convex objectives but leaving the case of strong convexity with LL4 unknown unaddressed for Q- or R-linear convergence (Bao et al., 2023).

2. Lyapunov Construction in Discrete-time NAG

For any LL5, define the discrete Lyapunov sequence:

  • LL6, where
  • LL7 (potential component),
  • LL8 (mixed/kinetic component).

This choice has the property LL9 if and only if f:RnRf:\mathbb{R}^n\to\mathbb{R}0 and f:RnRf:\mathbb{R}^n\to\mathbb{R}1. It tightly links the function error and the coupled system state, and is specifically constructed to allow for sharp contraction estimates (Bao et al., 2023).

3. Linear (Q-linear) Contraction and Global R-linear Convergence

By leveraging f:RnRf:\mathbb{R}^n\to\mathbb{R}2-smoothness and f:RnRf:\mathbb{R}^n\to\mathbb{R}3-strong convexity, the Lyapunov difference at iteration f:RnRf:\mathbb{R}^n\to\mathbb{R}4 satisfies: f:RnRf:\mathbb{R}^n\to\mathbb{R}5 Simultaneous upper bounds and careful coefficient matching yield

f:RnRf:\mathbb{R}^n\to\mathbb{R}6

for explicit, bounded sequences f:RnRf:\mathbb{R}^n\to\mathbb{R}7, from which

f:RnRf:\mathbb{R}^n\to\mathbb{R}8

with f:RnRf:\mathbb{R}^n\to\mathbb{R}9 depending only on xk+1=yksf(yk)x_{k+1} = y_k - s \nabla f(y_k)0 and founded on the explicit form: xk+1=yksf(yk)x_{k+1} = y_k - s \nabla f(y_k)1 This proves Q-linear contraction of xk+1=yksf(yk)x_{k+1} = y_k - s \nabla f(y_k)2 and, by the structure of xk+1=yksf(yk)x_{k+1} = y_k - s \nabla f(y_k)3, yields global R-linear convergence: xk+1=yksf(yk)x_{k+1} = y_k - s \nabla f(y_k)4 For xk+1=yksf(yk)x_{k+1} = y_k - s \nabla f(y_k)5, the Lyapunov can be reduced to a classical two-term form xk+1=yksf(yk)x_{k+1} = y_k - s \nabla f(y_k)6, with xk+1=yksf(yk)x_{k+1} = y_k - s \nabla f(y_k)7 chosen optimally, and geometric decrease xk+1=yksf(yk)x_{k+1} = y_k - s \nabla f(y_k)8 established, unrolling to xk+1=yksf(yk)x_{k+1} = y_k - s \nabla f(y_k)9 (Bao et al., 2023).

4. Extension to Accelerated Proximal Gradient Methods

For composite objectives βk+1=tk+11tk+2\beta_{k+1} = \frac{t_{k+1}-1}{t_{k+2}}0, with βk+1=tk+11tk+2\beta_{k+1} = \frac{t_{k+1}-1}{t_{k+2}}1 closed and convex, define the proximal-gradient mapping βk+1=tk+11tk+2\beta_{k+1} = \frac{t_{k+1}-1}{t_{k+2}}2. The accelerated proximal gradient (APG) update replaces βk+1=tk+11tk+2\beta_{k+1} = \frac{t_{k+1}-1}{t_{k+2}}3 with βk+1=tk+11tk+2\beta_{k+1} = \frac{t_{k+1}-1}{t_{k+2}}4 in the same NAG iteration. The Lyapunov-based analysis extends identically: βk+1=tk+11tk+2\beta_{k+1} = \frac{t_{k+1}-1}{t_{k+2}}5 for βk+1=tk+11tk+2\beta_{k+1} = \frac{t_{k+1}-1}{t_{k+2}}6, or βk+1=tk+11tk+2\beta_{k+1} = \frac{t_{k+1}-1}{t_{k+2}}7 for βk+1=tk+11tk+2\beta_{k+1} = \frac{t_{k+1}-1}{t_{k+2}}8 (Bao et al., 2023).

5. Continuous-Time ODE Comparison and High-Resolution Effects

The classical low-resolution ODE,

βk+1=tk+11tk+2\beta_{k+1} = \frac{t_{k+1}-1}{t_{k+2}}9

yields only yk+1=xk+1+βk+1(xk+1xk)y_{k+1} = x_{k+1} + \beta_{k+1} (x_{k+1} - x_k)0 sublinear convergence, even for quadratics, with no possibility of exponential decay; yk+1=xk+1+βk+1(xk+1xk)y_{k+1} = x_{k+1} + \beta_{k+1} (x_{k+1} - x_k)1 as yk+1=xk+1+βk+1(xk+1xk)y_{k+1} = x_{k+1} + \beta_{k+1} (x_{k+1} - x_k)2. Thus, the discrete NAG method exhibits R-linear convergence without requiring yk+1=xk+1+βk+1(xk+1xk)y_{k+1} = x_{k+1} + \beta_{k+1} (x_{k+1} - x_k)3 in yk+1=xk+1+βk+1(xk+1xk)y_{k+1} = x_{k+1} + \beta_{k+1} (x_{k+1} - x_k)4, but this is impossible for the standard ODE limit (Bao et al., 2023). This discrepancy arises because the low-resolution ODE omits key yk+1=xk+1+βk+1(xk+1xk)y_{k+1} = x_{k+1} + \beta_{k+1} (x_{k+1} - x_k)5 correction terms essential for exponential decay in discrete NAG.

High-resolution ODEs (Shi–Du–Jordan–Su) restore yk+1=xk+1+βk+1(xk+1xk)y_{k+1} = x_{k+1} + \beta_{k+1} (x_{k+1} - x_k)6-scale terms, e.g.,

yk+1=xk+1+βk+1(xk+1xk)y_{k+1} = x_{k+1} + \beta_{k+1} (x_{k+1} - x_k)7

and locally, linear decay in objective is recovered, but global exponential decay remains unresolved in the continuous perspective (Bao et al., 2023).

6. Broader Significance: Uniform Q-Linear Theory with Unknown yk+1=xk+1+βk+1(xk+1xk)y_{k+1} = x_{k+1} + \beta_{k+1} (x_{k+1} - x_k)8

The Lyapunov-based contraction and convergence theorems conclusively demonstrate that standard NAG, with classical extrapolation parameters and yk+1=xk+1+βk+1(xk+1xk)y_{k+1} = x_{k+1} + \beta_{k+1} (x_{k+1} - x_k)9 unknown, achieves global R-linear convergence on all s(0,1/L]s\in(0,1/L]0-strongly convex, s(0,1/L]s\in(0,1/L]1-smooth s(0,1/L]s\in(0,1/L]2 (Bao et al., 2023). The result is robust under any allowed sequence s(0,1/L]s\in(0,1/L]3 with s(0,1/L]s\in(0,1/L]4, s(0,1/L]s\in(0,1/L]5, s(0,1/L]s\in(0,1/L]6, s(0,1/L]s\in(0,1/L]7. For s(0,1/L]s\in(0,1/L]8, explicit and uniform bounds are provided for the contraction factors. This refines and expands the understanding of NAG's parameter robustness in practice and highlights a critical, previously unresolved, regime.

The analysis highlights the inapplicability of the continuous-time low-resolution ODE theory for capturing exponential-type contraction without s(0,1/L]s\in(0,1/L]9 correction. The Lyapunov framework further enables direct extension of R-linear convergence to composite proximal settings (e.g., APG/FISTA), even when problem structure or parameter knowledge is incomplete.

7. Summary Table: Key Results

Property Low-Resolution ODE Discrete NAG (unknown tkt_k0) High-Resolution ODE (local)
Convergence type tkt_k1 sublinear Global R-linear (Q-linear Lyapunov) Local linear decay
Parameter requirement tkt_k2 for exponential rate tkt_k3-independent tkt_k4 (classical) tkt_k5 correction in ODE
Applicability to APG/FISTA No Yes (identical Lyapunov contraction) Local only

The crucial insight is that the Lyapunov discrete analysis, not the ODE continuous-time intuition, accurately predicts and certifies the true algorithmic exponential convergence regime for strongly convex Nesterov's methods without explicit knowledge of tkt_k6 (Bao et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Lyapunov Analysis of Nesterov's Accelerated Gradient Method.