Lyapunov Analysis of Nesterov's AG

Updated 23 November 2025

The paper demonstrates that Lyapunov analysis provides explicit contraction factors ensuring global R-linear convergence for Nesterov's Accelerated Gradient in μ-strongly convex, L-smooth settings.
It introduces a discrete Lyapunov sequence that tightly couples function error and state dynamics, enabling precise control over algorithmic contraction rates.
The analysis extends to accelerated proximal gradient methods and contrasts discrete NAG behavior with continuous ODE models, highlighting the need for high-resolution corrections.

Nesterov's Accelerated Gradient (NAG) method is a foundational extrapolation-based algorithm for convex optimization. Lyapunov analysis has emerged as the central tool for certifying both accelerated and linear convergence rates of NAG—including in circumstances where the strong convexity parameter is unknown. Such analyses systematically relate the decrease of structural “energy” sequences to convergence guarantees, revealing delicate distinctions between algorithmic, continuous-time, and high/low-resolution dynamical perspectives.

1. Algorithmic Framework and Extrapolation Parameters

Nesterov's method in the context of $\mu$ -strongly convex and $L$ -smooth $f:\mathbb{R}^n\to\mathbb{R}$ operates through coupled updates:

$x_{k+1} = y_k - s \nabla f(y_k)$ ,
$\beta_{k+1} = \frac{t_{k+1}-1}{t_{k+2}}$ ,
$y_{k+1} = x_{k+1} + \beta_{k+1} (x_{k+1} - x_k)$ ,

where $s\in(0,1/L]$ and $t_k$ is a sequence satisfying $t_1 = 1$ , $t_{k+1}^2 - t_{k+1} \leq t_k^2$ , $t_{k+1}>t_k$ , $t_k\to\infty$ . Standard (classical) NAG chooses $(t_k)$ independent of $\mu$ , recovering optimal accelerated rates for convex objectives but leaving the case of strong convexity with $\mu$ unknown unaddressed for Q- or R-linear convergence (Bao et al., 2023).

2. Lyapunov Construction in Discrete-time NAG

For any $s < 1/L$ , define the discrete Lyapunov sequence:

$L_k = P_k + M_k$ , where
$P_k := s (t_{k+1}-1) t_{k+1} \big(f(x_k)-f^*\big)$ (potential component),
$M_k := \frac12 \| (t_{k+1}-1)(y_k-x_k) + (y_k-x^*) \|^2$ (mixed/kinetic component).

This choice has the property $L_k \to 0$ if and only if $f(x_k)\to f^*$ and $y_k\to x^*$ . It tightly links the function error and the coupled system state, and is specifically constructed to allow for sharp contraction estimates (Bao et al., 2023).

3. Linear (Q-linear) Contraction and Global R-linear Convergence

By leveraging $L$ -smoothness and $\mu$ -strong convexity, the Lyapunov difference at iteration $k$ satisfies: $L_{k+1} - L_k \leq -\frac12 s^2 t_{k+1}^2 (1-sL)\|\nabla f(y_k)\|^2 - \frac12 \mu s (t_{k+1}-1)t_{k+1} \|y_k-x_k\|^2 - \frac12 \mu s t_{k+1} \|y_k-x^*\|^2.$ Simultaneous upper bounds and careful coefficient matching yield

$L_k \leq C_k (L_k - L_{k+1}), \quad L_k \leq (D_k-1)(L_k - L_{k+1})$

for explicit, bounded sequences $(C_k), (D_k)$ , from which

$L_{k+1} \leq \rho_k L_k, \quad \rho_k = 1 - \frac{1}{\min\{C_k, D_k\}} < \bar\rho < 1,$

with $\bar\rho$ depending only on $L,\mu,s$ and founded on the explicit form: $\bar{\rho} \leq 1 - \frac{(1-Ls)\mu s}{1+\max\{\mu/L,1/8\}}.$ This proves Q-linear contraction of $L_k$ and, by the structure of $P_k$ , yields global R-linear convergence: $f(x_k) - f^* \leq \left[\prod_{i=0}^{k-1} \rho_i\right] \cdot \frac{\|x_0 - x^*\|^2}{2s (t_{k+1} - 1)t_{k+1}}.$ For $s=1/L$ , the Lyapunov can be reduced to a classical two-term form $H_k = \lambda(f(x_k)-f^*) + \frac12 \|x_k-x_{k-1}\|^2$ , with $\lambda$ chosen optimally, and geometric decrease $H_{k+1} \leq \rho H_k$ established, unrolling to $f(x_k)-f^* \leq \rho^k(f(x_0)-f^*)$ (Bao et al., 2023).

4. Extension to Accelerated Proximal Gradient Methods

For composite objectives $F(x)=f(x)+g(x)$ , with $g$ closed and convex, define the proximal-gradient mapping $G_s(y) = \frac{1}{s}(y - \text{prox}_{sg}(y - s\nabla f(y)))$ . The accelerated proximal gradient (APG) update replaces $\nabla f(y_k)$ with $G_s(y_k)$ in the same NAG iteration. The Lyapunov-based analysis extends identically: $F(x_k) - F^* \leq \prod_{i=0}^{k-1}\rho_i \cdot \frac{\|x_0-x^*\|^2}{2 s (t_{k+1}-1) t_{k+1}},$ for $s<1/L$ , or $F(x_k) - F^* \leq \rho^k (F(x_0) - F^*)$ for $s=1/L$ (Bao et al., 2023).

5. Continuous-Time ODE Comparison and High-Resolution Effects

The classical low-resolution ODE,

$\ddot{X}(t) + \frac{3}{t}\dot{X}(t) + \nabla f(X(t)) = 0,$

yields only $O(1/t^2)$ sublinear convergence, even for quadratics, with no possibility of exponential decay; $t^3(f(X(t))-f^*)\not\to0$ as $t\to\infty$ . Thus, the discrete NAG method exhibits R-linear convergence without requiring $\mu$ in $(\beta_k)$ , but this is impossible for the standard ODE limit (Bao et al., 2023). This discrepancy arises because the low-resolution ODE omits key $O(\sqrt{s})$ correction terms essential for exponential decay in discrete NAG.

High-resolution ODEs (Shi–Du–Jordan–Su) restore $O(\sqrt{s})$ -scale terms, e.g.,

$\ddot{X}(t) + 2\sqrt{\mu}\dot{X}(t) + \sqrt{s}\nabla^2 f(X(t))\dot{X}(t) + \nabla f(X(t)) = 0,$

and locally, linear decay in objective is recovered, but global exponential decay remains unresolved in the continuous perspective (Bao et al., 2023).

6. Broader Significance: Uniform Q-Linear Theory with Unknown $\mu$

The Lyapunov-based contraction and convergence theorems conclusively demonstrate that standard NAG, with classical extrapolation parameters and $\mu$ unknown, achieves global R-linear convergence on all $\mu$ -strongly convex, $L$ -smooth $f$ (Bao et al., 2023). The result is robust under any allowed sequence $(t_k)$ with $t_1=1$ , $t_{k+1}^2-t_{k+1}\le t_k^2$ , $t_{k+1}>t_k$ , $t_k\to\infty$ . For $s<1/L$ , explicit and uniform bounds are provided for the contraction factors. This refines and expands the understanding of NAG's parameter robustness in practice and highlights a critical, previously unresolved, regime.

The analysis highlights the inapplicability of the continuous-time low-resolution ODE theory for capturing exponential-type contraction without $O(\sqrt{s})$ correction. The Lyapunov framework further enables direct extension of R-linear convergence to composite proximal settings (e.g., APG/FISTA), even when problem structure or parameter knowledge is incomplete.

7. Summary Table: Key Results

Property	Low-Resolution ODE	Discrete NAG (unknown $\mu$ )	High-Resolution ODE (local)
Convergence type	$O(1/t^2)$ sublinear	Global R-linear (Q-linear Lyapunov)	Local linear decay
Parameter requirement	$\mu$ for exponential rate	$\mu$ -independent $\beta_k$ (classical)	$O(\sqrt{s})$ correction in ODE
Applicability to APG/FISTA	No	Yes (identical Lyapunov contraction)	Local only

The crucial insight is that the Lyapunov discrete analysis, not the ODE continuous-time intuition, accurately predicts and certifies the true algorithmic exponential convergence regime for strongly convex Nesterov's methods without explicit knowledge of $\mu$ (Bao et al., 2023).

PDF Markdown Chat (Pro)

References (1)

The Global R-linear Convergence of Nesterov's Accelerated Gradient Method with Unknown Strongly Convex Parameter (2023)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Lyapunov Analysis of Nesterov's Accelerated Gradient Method.