Lyapunov Analysis of Nesterov's AG
- The paper demonstrates that Lyapunov analysis provides explicit contraction factors ensuring global R-linear convergence for Nesterov's Accelerated Gradient in μ-strongly convex, L-smooth settings.
- It introduces a discrete Lyapunov sequence that tightly couples function error and state dynamics, enabling precise control over algorithmic contraction rates.
- The analysis extends to accelerated proximal gradient methods and contrasts discrete NAG behavior with continuous ODE models, highlighting the need for high-resolution corrections.
Nesterov's Accelerated Gradient (NAG) method is a foundational extrapolation-based algorithm for convex optimization. Lyapunov analysis has emerged as the central tool for certifying both accelerated and linear convergence rates of NAG—including in circumstances where the strong convexity parameter is unknown. Such analyses systematically relate the decrease of structural “energy” sequences to convergence guarantees, revealing delicate distinctions between algorithmic, continuous-time, and high/low-resolution dynamical perspectives.
1. Algorithmic Framework and Extrapolation Parameters
Nesterov's method in the context of -strongly convex and -smooth operates through coupled updates:
- ,
- ,
- ,
where and is a sequence satisfying , , , . Standard (classical) NAG chooses independent of , recovering optimal accelerated rates for convex objectives but leaving the case of strong convexity with unknown unaddressed for Q- or R-linear convergence (Bao et al., 2023).
2. Lyapunov Construction in Discrete-time NAG
For any , define the discrete Lyapunov sequence:
- , where
- (potential component),
- (mixed/kinetic component).
This choice has the property if and only if and . It tightly links the function error and the coupled system state, and is specifically constructed to allow for sharp contraction estimates (Bao et al., 2023).
3. Linear (Q-linear) Contraction and Global R-linear Convergence
By leveraging -smoothness and -strong convexity, the Lyapunov difference at iteration satisfies: Simultaneous upper bounds and careful coefficient matching yield
for explicit, bounded sequences , from which
with depending only on and founded on the explicit form: This proves Q-linear contraction of and, by the structure of , yields global R-linear convergence: For , the Lyapunov can be reduced to a classical two-term form , with chosen optimally, and geometric decrease established, unrolling to (Bao et al., 2023).
4. Extension to Accelerated Proximal Gradient Methods
For composite objectives , with closed and convex, define the proximal-gradient mapping . The accelerated proximal gradient (APG) update replaces with in the same NAG iteration. The Lyapunov-based analysis extends identically: for , or for (Bao et al., 2023).
5. Continuous-Time ODE Comparison and High-Resolution Effects
The classical low-resolution ODE,
yields only sublinear convergence, even for quadratics, with no possibility of exponential decay; as . Thus, the discrete NAG method exhibits R-linear convergence without requiring in , but this is impossible for the standard ODE limit (Bao et al., 2023). This discrepancy arises because the low-resolution ODE omits key correction terms essential for exponential decay in discrete NAG.
High-resolution ODEs (Shi–Du–Jordan–Su) restore -scale terms, e.g.,
and locally, linear decay in objective is recovered, but global exponential decay remains unresolved in the continuous perspective (Bao et al., 2023).
6. Broader Significance: Uniform Q-Linear Theory with Unknown
The Lyapunov-based contraction and convergence theorems conclusively demonstrate that standard NAG, with classical extrapolation parameters and unknown, achieves global R-linear convergence on all -strongly convex, -smooth (Bao et al., 2023). The result is robust under any allowed sequence with , , , . For , explicit and uniform bounds are provided for the contraction factors. This refines and expands the understanding of NAG's parameter robustness in practice and highlights a critical, previously unresolved, regime.
The analysis highlights the inapplicability of the continuous-time low-resolution ODE theory for capturing exponential-type contraction without correction. The Lyapunov framework further enables direct extension of R-linear convergence to composite proximal settings (e.g., APG/FISTA), even when problem structure or parameter knowledge is incomplete.
7. Summary Table: Key Results
| Property | Low-Resolution ODE | Discrete NAG (unknown ) | High-Resolution ODE (local) |
|---|---|---|---|
| Convergence type | sublinear | Global R-linear (Q-linear Lyapunov) | Local linear decay |
| Parameter requirement | for exponential rate | -independent (classical) | correction in ODE |
| Applicability to APG/FISTA | No | Yes (identical Lyapunov contraction) | Local only |
The crucial insight is that the Lyapunov discrete analysis, not the ODE continuous-time intuition, accurately predicts and certifies the true algorithmic exponential convergence regime for strongly convex Nesterov's methods without explicit knowledge of (Bao et al., 2023).