Papers
Topics
Authors
Recent
Search
2000 character limit reached

Geodesic Gradient Descent (GGD)

Updated 2 July 2026
  • Geodesic Gradient Descent (GGD) is a method that optimizes functions on Riemannian manifolds by following geodesics, thereby preserving manifold structure and leveraging curvature for faster convergence.
  • By utilizing the exponential map and intrinsic metrics, GGD provides closed-form updates on special manifolds like hyperbolic spaces, spheres, and positive-definite matrices with provable convergence guarantees.
  • GGD’s adaptability through variants such as hyperparameter-free, stochastic, and adaptive-step methods makes it a valuable tool for applications in manifold learning, geometric deep learning, and structured optimization.

Geodesic Gradient Descent (GGD) is a principled optimization method that generalizes classical gradient descent techniques to the setting of Riemannian manifolds. Instead of parameter updates along Euclidean straight lines, GGD progresses along geodesics determined by the manifold's intrinsic geometry and the Riemannian metric, thereby preserving manifold structure and exploiting curvature for accelerated convergence and increased robustness—particularly in problems exhibiting non-Euclidean, highly nonlinear, or structured parameter spaces (Wilson et al., 2018, Hu et al., 28 Feb 2026, Fioresi et al., 2019, Wensing et al., 2018).

1. Mathematical Foundations: Gradients and Geodesics

Given a smooth nn-dimensional Riemannian manifold (M,g)(M,g), the Riemannian gradient ∇Mf(x)∈TxM\nabla_M f(x) \in T_x M of a smooth function f:M→Rf: M \rightarrow \mathbb{R} is defined as the unique tangent vector at xx satisfying gx(∇Mf(x),v)=Df(x)[v]g_x(\nabla_M f(x), v) = Df(x)[v] for all v∈TxMv \in T_x M. In local coordinates (with metric tensor G(x)G(x)), this yields ∇Mf(x)=G(x)−1∇Euclf(x)\nabla_M f(x) = G(x)^{-1} \nabla_{\text{Eucl}} f(x), directly generalizing the Euclidean gradient (Wilson et al., 2018).

Updates are performed using the exponential map $\Exp_{x}(v)$, which transports the point (M,g)(M,g)0 along the geodesic in direction (M,g)(M,g)1 for unit time. The canonical GGD update is: (M,g)(M,g)2 where (M,g)(M,g)3 is the step size (Wensing et al., 2018). The exponential map is geometrically well-defined for geodesically complete manifolds; closed-form expressions are available on certain geometries (e.g., hyperbolic space and the sphere), while general manifolds may require retraction approximations (Wilson et al., 2018, Dong et al., 2022).

2. Algorithmic Realizations and Model-Specific Instances

GGD can be adapted to diverse manifold structures by substituting the relevant Riemannian metric and computing the exponential map accordingly.

Hyperboloid Model of Hyperbolic Space:

Within the hyperboloid model (M,g)(M,g)4, with Minkowski metric (M,g)(M,g)5, the GGD step involves:

  • Computing the ambient (Minkowski) gradient, projecting to the tangent space (M,g)(M,g)6, and applying the exponential map,
  • Using closed-form updates involving hyperbolic trigonometric functions:

(M,g)(M,g)7

This yields stable, geometry-preserving steps with per-iteration complexity (M,g)(M,g)8, substantially reducing the iteration count (e.g., ~46% fewer steps than Poincaré-ball retraction schemes for Fréchet mean problems) (Wilson et al., 2018).

Positive-Definite Matrix Manifolds:

For (M,g)(M,g)9, the manifold of ∇Mf(x)∈TxM\nabla_M f(x) \in T_x M0 Hermitian positive-definite matrices with the affine-invariant metric, geodesic updates follow: ∇Mf(x)∈TxM\nabla_M f(x) \in T_x M1 with per-iteration complexity ∇Mf(x)∈TxM\nabla_M f(x) \in T_x M2, required for matrix root and exponential computations. Applications include barycentre (Karcher mean) and control problems on matrix manifolds (Duan et al., 2019).

Sphere and Objective-Induced Geometry:

On ∇Mf(x)∈TxM\nabla_M f(x) \in T_x M3, the exponential map is ∇Mf(x)∈TxM\nabla_M f(x) \in T_x M4; GGD steps are projected along great circles. The method also generalizes via local spherical approximations around points on objective-induced hypersurfaces, resulting in parameter-free, adaptive-step GGD algorithms that do not require learning rate tuning (Hu et al., 28 Feb 2026).

3. Theoretical Guarantees and Convergence Analysis

Central to the analysis of GGD is the concept of geodesic convexity (g-convexity), where ∇Mf(x)∈TxM\nabla_M f(x) \in T_x M5 is g-convex if

∇Mf(x)∈TxM\nabla_M f(x) \in T_x M6

for all ∇Mf(x)∈TxM\nabla_M f(x) \in T_x M7 and any geodesic ∇Mf(x)∈TxM\nabla_M f(x) \in T_x M8 from ∇Mf(x)∈TxM\nabla_M f(x) \in T_x M9 to f:M→Rf: M \rightarrow \mathbb{R}0. In the presence of f:M→Rf: M \rightarrow \mathbb{R}1-strong g-convexity and f:M→Rf: M \rightarrow \mathbb{R}2-smoothness, the following hold (Shu et al., 9 Apr 2025, Wensing et al., 2018):

  • Linear convergence: f:M→Rf: M \rightarrow \mathbb{R}3, f:M→Rf: M \rightarrow \mathbb{R}4 with f:M→Rf: M \rightarrow \mathbb{R}5 and properly chosen f:M→Rf: M \rightarrow \mathbb{R}6,
  • Sublinear f:M→Rf: M \rightarrow \mathbb{R}7 or f:M→Rf: M \rightarrow \mathbb{R}8 rates in non-strongly convex or stochastic settings.

Recent work replaces classical curvature and bounded-domain assumptions with the quasilinearization framework, permitting optimal convergence guarantees under substantially weaker assumptions—specifically, on arbitrary Hadamard manifolds (complete, simply connected, with nonpositive sectional curvature) (Shu et al., 9 Apr 2025).

4. Connections, Extensions, and Variants

GGD both generalizes and subsumes natural gradient and Riemannian gradient descent techniques. By selecting an appropriate Riemannian metric—e.g., the Fisher information, a Hessian metric, or one induced from a reference manifold via pullback—GGD becomes a mechanism for aligning descent with problem-specific geometry, accelerating convergence (notably in high-dimensional or ill-conditioned landscapes) (Dong et al., 2022, Fioresi et al., 2019).

Specialized variants include:

Table: Selected GGD-Related Manifolds and Algorithms

Manifold Riemannian Exponential Map Per-iteration Complexity
Hyperbolic (xx0) xx1 xx2
Sphere (xx3) xx4 xx5
xx6 (SPD) xx7 xx8

5. Empirical Performance and Applications

Empirical studies demonstrate accelerated convergence and robustness for GGD over classical optimizers:

  • On the Burgers' PDE benchmark (regression), test MSE reductions of 35.8–48.8% against Adam.
  • On MNIST (classification), cross-entropy loss improvements of 3.1–11.6% and final test accuracy up to 99.3%, with only modest computational overhead compared to Adam. Performance gains are especially pronounced as model depth increases (Hu et al., 28 Feb 2026).
  • On positive definite matrices and the Karcher mean, natural-gradient variants outperform standard Riemannian GD in convergence speed (Duan et al., 2019).

GGD is instrumental in manifold learning, geometric deep learning, statistical inference on structured spaces, and optimization on information geometric models (e.g., exponential families, dually flat spaces)—where an m-geodesic update can, for log-likelihood maximization, theoretically reach the MLE in a single step (Omiya et al., 10 Dec 2025).

6. Structural Advantages and Limitations

Advantages of GGD include:

  • Geometry awareness: updates remain on manifold, respecting constraints and topology.
  • Stability and robustness: exponential-map updates avoid overshooting and instability present in retraction-based or projection-based methods (Wilson et al., 2018).
  • Generality and extensibility: applicable to arbitrary manifolds and adaptable to problem-induced geometry via metric design (Dong et al., 2022).
  • Hyperparameter minimization: local sphere-based GGD implementations eliminate learning-rate tuning (Hu et al., 28 Feb 2026).

Limitations involve:

  • Metric selection: constructing and efficiently inverting the metric may be challenging in large or unstructured models (Dong et al., 2022).
  • Closed-form exponential maps exist only for certain manifolds; otherwise, reliance on numerical geodesic solvers or retractions is required.
  • Guarantees, such as global optimality and convergence rate, still depend on establishing (strong) geodesic convexity, which may not hold in highly nonconvex settings (Shu et al., 9 Apr 2025).

7. Outlook and Open Problems

Current research directions include:

  • Development of hyperparameter-free, locally adaptive step-size schemes for general geometry (Hu et al., 28 Feb 2026, Ansari-Önnestam et al., 23 Apr 2025).
  • Automated metric learning and meta-optimization for accelerating GGD in complex landscapes (Dong et al., 2022).
  • Integration of GGD with second-order, stochastic, or momentum-based Riemannian methods.
  • Extensions to time-varying optimization, game-theoretic equilibria, and hierarchical manifold-structured problems (Wensing et al., 2018).
  • Theoretical characterization of nonconvex convergence, especially on manifolds without sectional curvature bounds or under weak regularity.

GGD thus constitutes a unifying and theoretically rigorous optimization framework for manifold-valued and geometrically-structured optimization problems, with demonstrated practical efficacy and evolving theory across the spectrum of smooth, strongly convex, and stochastic regimes.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Geodesic Gradient Descent (GGD).