Generalized Gradient Descent Recursion

Updated 5 December 2025

Generalized Gradient Descent Recursion is an iterative numerical optimization framework that replaces the standard update rule with adaptive, geometry- and regularization-sensitive mappings.
It leverages generalized smoothness conditions and q-functions to design context-sensitive step-size rules, effectively recovering classical methods like mirror descent and Newton-type updates.
The approach offers improved convergence guarantees in both convex and nonconvex settings and extends applicability to complex scenarios including manifold and Bregman-regularized optimization.

A generalised gradient descent recursion refers to any systematic iterative scheme for numerical optimization based on generalized notions of smoothness, geometry, or underlying algebraic structure, wherein the classic update $x_{k+1} = x_k - \gamma_k \nabla f(x_k)$ is replaced by a more intricate mapping that adapts to problem-specific curvature, metric, or regularization. The general framework seeks to unify, extend, and improve upon classical gradient descent by replacing the step-size or direction with context-sensitive, theoretically motivated recursions that may recover mirror descent, natural gradient, Newton-type, or other non-Euclidean flows as special cases. The formalism is applicable to a broad range of settings, including nonconvex, nonsmooth, manifold-valued, composite, and Bregman-regularized optimization.

1. Generalized Smoothness and ℓ-Gradient Descent

Classical gradient descent assumes $L$ -smoothness, i.e., $\|\nabla^2 f(x)\|\le L$ for some $L>0$ , leading to the canonical step $\gamma_k=1/L$ (Tyurin, 2024). The generalized theory introduces an ℓ-smoothness condition: $\|\nabla^2 f(x)\| \leq \ell(\|\nabla f(x)\|)$ with $\ell:\mathbb{R}_{\ge0}\to(0,\infty)$ nondecreasing, positive, and locally Lipschitz. Choices include

$\ell(s)=L$ : classical case
$\ell(s)=L_0+L_1 s$ : (L₀,L₁)-smoothness
$\ell(s)=L_0+L_1 s^p$ for $p\ge0$ : polynomial growth

This assumption allows for adaptive, data-driven adjustment of the step size depending on the local gradient norm.

2. One-Dimensional q-Function and Nonquadratic Taylor Bounds

For adaptive step size rules, the key technical device is the "q-function," defined as

$q(s;a) := \int_0^s \frac{dt}{\ell(a+t)}$

with $a\ge0$ and $s\in[0,q_{\max}(a))$ where $q_{\max}(a) := \int_0^\infty \frac{dt}{\ell(a+t)}$ . Its inverse $q^{-1}(\cdot;a)$ is strictly increasing and $C^1$ .

Central consequences:

Generalized Lipschitz bound on the gradient difference: $\|\nabla f(y) - \nabla f(x)\| \leq q^{-1}(\|y-x\|;\|\nabla f(x)\|)$ for $\|y-x\|\le q_{\max}(\|\nabla f(x)\|)$ .
Generalized upper bound for function values: $f(y) \leq f(x) + \langle \nabla f(x), y-x\rangle + \|y-x\|\int_0^1 q^{-1}(\|y-x\|;\|\nabla f(x)\|)dt$

These bounds reduce to the standard quadratic model when $\ell$ is constant (Tyurin, 2024).

3. Derivation of Generalized Gradient Descent Recursion

At iteration $k$ , the optimal update in the direction $h^*=-\nabla f(x_k)/\|\nabla f(x_k)\|$ with step length $t^* = q(\|\nabla f(x_k)\|; \|\nabla f(x_k)\|)$ yields the general update: $x_{k+1} = x_k - \gamma_k \nabla f(x_k)$ with

$\gamma_k = \int_0^1 \frac{dv}{\ell(\|\nabla f(x_k)\| + v\|\nabla f(x_k)\|)}$

Bounding $\ell$ shows $1/\ell(2\|\nabla f\|)\leq \gamma_k \leq 1/\ell(\|\nabla f\|)$ , ensuring the method interpolates between aggressive and conservative step sizes depending on the local gradient scale (Tyurin, 2024).

4. Convergence Theory: Nonconvex and Convex Settings

Nonconvex scenario: If $f$ is bounded below, one obtains (Theorem 7.1): $f(x_{k+1}) \leq f(x_k) - \gamma_k \|\nabla f(x_k)\|^2$ Summed over $T$ steps, this yields

$\min_{0\leq k < T} \frac{\|\nabla f(x_k)\|^2}{\ell(2\|\nabla f(x_k)\|)} \leq \frac{4\Delta}{T}$

with $\Delta = f(x_0)-f^*$ . For invertible $s\mapsto s/\ell(2s)$ this recovers the $O(1/T)$ rate in squared gradient norm (Tyurin, 2024).

Convex scenario: Two independent proofs confirm for minimizer $x^*$ and $R=\|x_0-x^*\|$ : $\min_{0\leq k \leq T} \ell(2\|\nabla f(x_k)\|)[f(x_k) - f(x^*)] \leq R^2 / (T+1)$ or via a two-phase argument (without invertibility of $\ell$ ) via monotonicity of $\|\nabla f(x_k)\|$ , still allowing optimal rates (Tyurin, 2024).

5. Special Cases and Recovery of Classical Schemes

The generalised recursion specializes to all classical first-order step schemes:

$\ell(s)=L$ : Recovers standard GD with step size $1/L$
$\ell(s)=L_0+L_1 s$ : Clipped step size
$\ell(s)=L_0+L_1 s^p$ with $0\le p<2$ : O(1/T) rates even in previously intractable "superquadratic" smoothness regimes

Thus, methodology smoothly interpolates between established methods according to the growth of the Hessian, offering new guarantees where previous approaches failed, e.g., in the case $p=2$ (Tyurin, 2024).

6. Illustrative Examples

Numerically, the generalised recursion outperforms classical schemes in settings where $\ell$ grows rapidly (Tyurin, 2024):

$f(x) = -\log x$ with $\ell(r) = L_0 + L_1 r^2$ : The $\ell$ -GD step converges in tens of iterations where $1/(L_0+L_1 r)$ diverges.
$f(x) = e^x + e^{1-x}$ , $\ell(r)=3.3+r$ : $\ell$ -GD converges in $\leq20$ steps versus $>200$ for classical rules.
For $\ell(r)=L_0+L_1 r^p$ ( $0\leq p\leq2$ ), improved rates and extension to the otherwise pathological $p>2$ case when gradients are bounded.

7. Context in General Optimization and Relation to Other Frameworks

The generalised recursion fits within broader optimization frameworks:

General cost-geometry (optimal transport, mirror descent): Surrogate minimization schemes where a generic "cost" $C(x, y)$ replaces the quadratic proximity, with the next iterate chosen as the minimizer of the linearized model plus $C(x_k, y)$ (Léger et al., 2023).
Natural and Riemannian gradient descent: The special case where $C$ corresponds to geodesic distance or a Hessian-induced metric, yielding updates in the pullback metric or local manifold geometry (Dong et al., 2022).
Bregman Distance: The update can be viewed as a minimization of first-order model plus a Bregman divergence, generalizing the Euclidean metric and linking to mirror descent and entropic methods (Benning et al., 2016 Benning et al., 2017).
Discrete Hamilton–Jacobi dynamics: Certain preconditioners (e.g., Laplacian smoothing) correspond to exactly GD on a more convex surrogate functional, sharing the same minima but with improved optimization geometry (Osher et al., 2018).
High-level algebraic frameworks: Abstract categorical approaches model gradient descent as a functor on categories of optimization problems, enabling parallel and distributed generalised recursions (Hanks et al., 2024).

Table: Step-Size Rules in Generalized Gradient Descent

$\ell(s)$ choice	Generalized step $\gamma_k$	Classical limit / method
$\ell(s)=L$	$1/L$	Vanilla GD
$\ell(s)=L_0+L_1 s$	$\int_0^1 \frac{dv}{L_0+L_1 [\\|\nabla f(x_k)\\|+v\\|\nabla f(x_k)\\|]}$	Clipped/variable step (L₀,L₁)-smooth (Tyurin, 2024)
$\ell(s)=L_0+L_1 s^p$	$\int_0^1 \frac{dv}{L_0+L_1 [\\|\nabla f(x_k)\\|+v\\|\nabla f(x_k)\\|]^p}$	O(1/T) for $p<2$ , new results for $p\geq2$

This table highlights that the generalized update mechanism provides a structured, theoretically sound means of adapting first-order optimization recursions to local problem geometry.

References

"Toward a Unified Theory of Gradient Descent under Generalized Smoothness" (Tyurin, 2024)
"Gradient descent with a general cost" (Léger et al., 2023)
"Generalization to the Natural Gradient Descent" (Dong et al., 2022)
"Gradient descent in a generalised Bregman distance framework" (Benning et al., 2016)
"Choose your path wisely: gradient descent in a Bregman distance framework" (Benning et al., 2017)
"Gradient descent in hyperbolic space" (Wilson et al., 2018)
"Generalized Gradient Descent is a Hypergraph Functor" (Hanks et al., 2024)
"Laplacian Smoothing Gradient Descent" (Osher et al., 2018)

Markdown Upgrade to Chat

References (8)

Toward a Unified Theory of Gradient Descent under Generalized Smoothness (2024)

Gradient descent with a general cost (2023)

Generalization to the Natural Gradient Descent (2022)

Gradient descent in a generalised Bregman distance framework (2016)

Choose your path wisely: gradient descent in a Bregman distance framework (2017)

Laplacian Smoothing Gradient Descent (2018)

Generalized Gradient Descent is a Hypergraph Functor (2024)

Gradient descent in hyperbolic space (2018)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Generalised Gradient Descent Recursion.

Generalized Gradient Descent Recursion

1. Generalized Smoothness and ℓ-Gradient Descent

2. One-Dimensional q-Function and Nonquadratic Taylor Bounds

3. Derivation of Generalized Gradient Descent Recursion

4. Convergence Theory: Nonconvex and Convex Settings

5. Special Cases and Recovery of Classical Schemes

6. Illustrative Examples

7. Context in General Optimization and Relation to Other Frameworks

Table: Step-Size Rules in Generalized Gradient Descent

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Generalized Gradient Descent Recursion

1. Generalized Smoothness and ℓ-Gradient Descent

2. One-Dimensional q-Function and Nonquadratic Taylor Bounds

3. Derivation of Generalized Gradient Descent Recursion

4. Convergence Theory: Nonconvex and Convex Settings

5. Special Cases and Recovery of Classical Schemes

6. Illustrative Examples

7. Context in General Optimization and Relation to Other Frameworks

Table: Step-Size Rules in Generalized Gradient Descent

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research