Accelerated GRAAL with Nesterov Acceleration

Updated 16 July 2025

The paper demonstrates that integrating adaptive curvature-based stepsize estimation with Nesterov acceleration attains the optimal O(1/k²) convergence rate in smooth convex optimization.
It employs a geometric stepsize update and adaptive momentum coupling, eliminating the need for global parameter tuning and expensive line searches.
The approach offers practical benefits for large-scale optimization, robustly adapting to unknown or variable curvature and overcoming setbacks from conservative initial stepsizes.

GRAAL with Nesterov Acceleration refers to an adaptive accelerated gradient method for smooth convex minimization that combines the key features of GRAAL (Gradient Regularized Adaptive Adaptive Learning, which uses local curvature for stepsize adaptation) and classical Nesterov acceleration. The method is designed to attain the optimal convergence rate $O(1/k^2)$ of accelerated gradient schemes, while retaining the practical strengths of parameter-free, line-search-free adaptive gradient methods. This approach addresses a longstanding question in the literature: can adaptive gradient methods, which estimate curvature and select stepsizes automatically, also benefit from optimal acceleration as in Nesterov’s method? The recent answer is affirmative, provided by the Accelerated GRAAL scheme (Borodich et al., 13 Jul 2025).

1. Historical Context and Motivation

The classical Nesterov accelerated gradient method, introduced in the 1980s, achieves the optimal $O(1/k^2)$ convergence rate for minimizing a smooth convex function, but its performance depends significantly on the choice of stepsize—typically requiring knowledge of the global Lipschitz constant $L$ or a costly line search. Meanwhile, adaptive gradient methods such as GRAAL [Malitsky, 2020] estimate stepsizes based on local curvature using only first-order information, requiring no tuning or trial-and-error, but until recently were only known to achieve the standard suboptimal $O(1/k)$ rate.

Bridging these two paradigms—combining adaptive stepsize selection from local curvature estimation with the acceleration mechanism of Nesterov-style methods—presents both technical and practical challenges. Previous attempts (e.g., AC-FGM [Li and Lan, 2023]) imposed restrictive update rules, sublinearly growing stepsizes, or required restarting protocols, and often could not adapt rapidly when the initial stepsize was poorly chosen. The recent Accelerated GRAAL method establishes that, with appropriate coupling of parameters and potential functions, adaptive methods can match the accelerated rate with provable iteration complexity guarantees (Borodich et al., 13 Jul 2025).

2. Algorithm Structure and Adaptive Curvature Estimation

The distinguishing feature of GRAAL with Nesterov Acceleration is its use of locally adaptive stepsizes, chosen independently at each iteration via estimation of a local inverse gradient Lipschitz constant. Specifically, at iteration $k$ , the method computes the parameter

$\lambda_{k+1} = \min\left\{ \Lambda(x_{k+1}; x_k),\, \Lambda(x_{k+1}; x_{k+1}) \right\}$

where, for a convex function $f$ ,

$\Lambda(x; z) = \begin{cases} \dfrac{2 B_f(x ; z)}{\|\nabla f(x) - \nabla f(z)\|^2}, & \nabla f(x) \neq \nabla f(z) \ {+\infty}, & \nabla f(x) = \nabla f(z) \end{cases}$

with $B_f(x; z)$ the Bregman divergence $f(x) - f(z) - \langle \nabla f(z), x-z\rangle$ .

The stepsize $\eta_{k+1}$ is updated using a “geometric growth” scheme: $\eta_{k+1} = \min \left\{ (1+\gamma) \eta_k,\; \frac{\nu H_{k-1}\, \lambda_{k+1}}{\eta_{k-1}} \right\}$ where $H_k = \sum_{i=0}^k \eta_i$ , and $\gamma,\nu >0$ are universal constants (not tuned per problem) such that $4\nu\theta(1+\gamma)^2 = \gamma$ for a fixed $\theta \in (0,1]$ . This update enables rapid stepsize adaptation: even with a very small initial stepsize, the method increases $\eta_k$ geometrically until a suitable regime is found.

3. Incorporation of Nesterov Acceleration: Coupling and Momentum

Acceleration is introduced by a coupling strategy inspired by modern interpretations of Nesterov’s method: the update

$x_{k+1} = x_k - \eta_k\, \nabla f_k(x_k)$

where

$f_k(x) = \frac{1}{\alpha_k} f(\alpha_k x + (1 - \alpha_k)x_k)$

and $\alpha_k \in (0,1]$ is adaptively determined. The algorithm tracks both the cumulative past stepsizes and an auxiliary sequence $\beta_k$ so that the critical invariant

$\frac{\eta_k}{\alpha_k\beta_k} = H_k$

holds. This allows the method to balance the extrapolated (momentum-influenced) and non-extrapolated updates adaptively, retaining the acceleration properties of deterministic Nesterov schemes without prescribing a pre-fixed schedule for $\alpha_k$ .

Through this mechanism, the accelerated GRAAL update strongly resembles the update structure of classical AGD, but with the essential difference that stepsizes and momentum weights are computed online using only first-order local information, and not fixed a priori.

4. Theoretical Guarantees and Optimal Iteration Complexity

Accelerated GRAAL achieves a non-asymptotic convergence rate matching that of classical fixed-stepsize accelerated gradient descent for $L$ -smooth convex functions: $f(x_K) - f(x^*) \leq \frac{2L \|x_0 - x^*\|^2}{K^2}$ up to a mild logarithmic overhead if an arbitrarily small initial stepsize $\eta_0$ is chosen. Specifically, the total number of iterations $K$ sufficient to reach $\epsilon$ accuracy satisfies

$K = O\left( \sqrt{L \|x_0 - x^*\|^2 / \epsilon} + \log(1/(\eta_0 L)) \right)$

where the first term is optimal and the additive logarithmic term is incurred only due to possibly poorly chosen $\eta_0$ . In practical scenarios, the algorithm amplifies $\eta_k$ rapidly to reach an appropriate scale, so this penalty is negligible.

The curvature estimation procedure ensures that, in smooth problems, the local estimate satisfies $\lambda_k \geq 1/L$ , so the method never instantiates an unsafe (too-large) stepsize, and adapts automatically even when $L$ is unknown or varies by region.

5. Comparison with Previous Approaches

Earlier accelerated adaptive methods, such as AC-FGM (Accelerated Conditional Fast Gradient Method) [Li and Lan, 2023], employ a stepsize recursion

$\eta_{k+1} = \min \left\{ (1 + 1/k) \eta_k,\, \frac{1}{8} k \lambda_{k+1} \right\}$

which only allows sublinear growth of stepsizes. If $\eta_0$ is underestimated, it may take $\Omega(1/\sqrt{\eta_0 L})$ iterations to adapt, thus potentially erasing the benefits of acceleration. In contrast, Accelerated GRAAL specifically enables geometric stepsize growth, leading to a much faster transition to the optimal regime and only a $\log(1/(\eta_0 L))$ additive cost, avoiding the severe slowdown of previous methods.

Another advantage is that Accelerated GRAAL requires neither global parameter tuning nor costly line searches, making it more robust for large-scale, practical optimization scenarios.

6. Implementation and Practical Implications

The method does not require any line search, global knowledge of the smoothness parameter, or hyperparameter tuning aside from setting a few universal constants. This feature makes the algorithm well-suited for black-box optimization or problems with unknown or spatially varying curvature.

Implementation involves calculating local curvature estimates for each step, updating stepsizes, and solving a fixed-point relation to maintain the coupling invariant. In practice, the matrix evaluations involved in $\Lambda(x;z)$ reduce to simple scalar updates in most large-scale applications by using the squared norm of the gradient difference and Bregman divergence calculations. The stepsize adaptation can naturally exploit any local “flatness” or low-curvature region to speed up progress.

Performance on convex smooth problems with variable curvature has been demonstrated to be competitive with—and in many cases superior to—both classical AGD and previous curvature-adaptive accelerations, especially when an accurate estimate of $L$ is not available or the initial stepsize is conservative.

7. Limitations and Potential Extensions

The current analysis and guarantees are established for smooth convex minimization; extension to structured non-smooth composite problems requires additional developments, particularly to design compatible coupling and stepsize policies that retain the geometric growth property. Likewise, generalization to nonconvex settings, or to problems with noise-corrupted gradients, may necessitate incorporating restart or slow-down heuristics analogous to those proposed for AGDP (Cohen et al., 2018).

A further possibility is combining the adaptive acceleration with geometric or operator-theoretic perspectives as in hybrid schemes (Karimi et al., 2017), or extending to decentralized/distributed optimization where only stochastic or local curvature information may be available.

In summary, GRAAL with Nesterov acceleration achieves the optimal $O(1/k^2)$ convergence rate for smooth convex problems while retaining the adaptivity and parameter-free nature of modern first-order methods. Its core innovations lie in combining geometric stepsize adaptation from local curvature with the accelerated coupling of Nesterov, without requiring hyperparameter tuning or costly line search, thus making it practical for large-scale, black-box, or ill-conditioned optimization tasks (Borodich et al., 13 Jul 2025).

PDF Markdown Chat (Pro)

References (3)

Nesterov Finds GRAAL: Optimal and Adaptive Gradient Method for Convex Optimization (2025)

On Acceleration with Noise-Corrupted Gradients (2018)

A single potential governing convergence of conjugate gradient, accelerated gradient and geometric descent (2017)

Follow Topic

Get notified by email when new papers are published related to GRAAL with Nesterov Acceleration.