Generalized Gradient Descent Recursion
- Generalized Gradient Descent Recursion is an iterative numerical optimization framework that replaces the standard update rule with adaptive, geometry- and regularization-sensitive mappings.
- It leverages generalized smoothness conditions and q-functions to design context-sensitive step-size rules, effectively recovering classical methods like mirror descent and Newton-type updates.
- The approach offers improved convergence guarantees in both convex and nonconvex settings and extends applicability to complex scenarios including manifold and Bregman-regularized optimization.
A generalised gradient descent recursion refers to any systematic iterative scheme for numerical optimization based on generalized notions of smoothness, geometry, or underlying algebraic structure, wherein the classic update is replaced by a more intricate mapping that adapts to problem-specific curvature, metric, or regularization. The general framework seeks to unify, extend, and improve upon classical gradient descent by replacing the step-size or direction with context-sensitive, theoretically motivated recursions that may recover mirror descent, natural gradient, Newton-type, or other non-Euclidean flows as special cases. The formalism is applicable to a broad range of settings, including nonconvex, nonsmooth, manifold-valued, composite, and Bregman-regularized optimization.
1. Generalized Smoothness and ℓ-Gradient Descent
Classical gradient descent assumes -smoothness, i.e., for some , leading to the canonical step (Tyurin, 16 Dec 2024). The generalized theory introduces an ℓ-smoothness condition: with nondecreasing, positive, and locally Lipschitz. Choices include
- : classical case
- : (L₀,L₁)-smoothness
- for : polynomial growth
This assumption allows for adaptive, data-driven adjustment of the step size depending on the local gradient norm.
2. One-Dimensional q-Function and Nonquadratic Taylor Bounds
For adaptive step size rules, the key technical device is the "q-function," defined as
with and where . Its inverse is strictly increasing and .
Central consequences:
- Generalized Lipschitz bound on the gradient difference: for .
- Generalized upper bound for function values:
These bounds reduce to the standard quadratic model when is constant (Tyurin, 16 Dec 2024).
3. Derivation of Generalized Gradient Descent Recursion
At iteration , the optimal update in the direction with step length yields the general update: with
Bounding shows , ensuring the method interpolates between aggressive and conservative step sizes depending on the local gradient scale (Tyurin, 16 Dec 2024).
4. Convergence Theory: Nonconvex and Convex Settings
Nonconvex scenario: If is bounded below, one obtains (Theorem 7.1): Summed over steps, this yields
with . For invertible this recovers the rate in squared gradient norm (Tyurin, 16 Dec 2024).
Convex scenario: Two independent proofs confirm for minimizer and : or via a two-phase argument (without invertibility of ) via monotonicity of , still allowing optimal rates (Tyurin, 16 Dec 2024).
5. Special Cases and Recovery of Classical Schemes
The generalised recursion specializes to all classical first-order step schemes:
- : Recovers standard GD with step size $1/L$
- : Clipped step size
- with : O(1/T) rates even in previously intractable "superquadratic" smoothness regimes
Thus, methodology smoothly interpolates between established methods according to the growth of the Hessian, offering new guarantees where previous approaches failed, e.g., in the case (Tyurin, 16 Dec 2024).
6. Illustrative Examples
Numerically, the generalised recursion outperforms classical schemes in settings where grows rapidly (Tyurin, 16 Dec 2024):
- with : The -GD step converges in tens of iterations where diverges.
- , : -GD converges in steps versus for classical rules.
- For (), improved rates and extension to the otherwise pathological case when gradients are bounded.
7. Context in General Optimization and Relation to Other Frameworks
The generalised recursion fits within broader optimization frameworks:
- General cost-geometry (optimal transport, mirror descent): Surrogate minimization schemes where a generic "cost" replaces the quadratic proximity, with the next iterate chosen as the minimizer of the linearized model plus (Léger et al., 2023).
- Natural and Riemannian gradient descent: The special case where corresponds to geodesic distance or a Hessian-induced metric, yielding updates in the pullback metric or local manifold geometry (Dong et al., 2022).
- Bregman Distance: The update can be viewed as a minimization of first-order model plus a Bregman divergence, generalizing the Euclidean metric and linking to mirror descent and entropic methods (Benning et al., 2016Benning et al., 2017).
- Discrete Hamilton–Jacobi dynamics: Certain preconditioners (e.g., Laplacian smoothing) correspond to exactly GD on a more convex surrogate functional, sharing the same minima but with improved optimization geometry (Osher et al., 2018).
- High-level algebraic frameworks: Abstract categorical approaches model gradient descent as a functor on categories of optimization problems, enabling parallel and distributed generalised recursions (Hanks et al., 28 Mar 2024).
Table: Step-Size Rules in Generalized Gradient Descent
| choice | Generalized step | Classical limit / method |
|---|---|---|
| $1/L$ | Vanilla GD | |
| Clipped/variable step (L₀,L₁)-smooth (Tyurin, 16 Dec 2024) | ||
| O(1/T) for , new results for |
This table highlights that the generalized update mechanism provides a structured, theoretically sound means of adapting first-order optimization recursions to local problem geometry.
References
- "Toward a Unified Theory of Gradient Descent under Generalized Smoothness" (Tyurin, 16 Dec 2024)
- "Gradient descent with a general cost" (Léger et al., 2023)
- "Generalization to the Natural Gradient Descent" (Dong et al., 2022)
- "Gradient descent in a generalised Bregman distance framework" (Benning et al., 2016)
- "Choose your path wisely: gradient descent in a Bregman distance framework" (Benning et al., 2017)
- "Gradient descent in hyperbolic space" (Wilson et al., 2018)
- "Generalized Gradient Descent is a Hypergraph Functor" (Hanks et al., 28 Mar 2024)
- "Laplacian Smoothing Gradient Descent" (Osher et al., 2018)