Geodesic Gradient Descent (GGD)
- Geodesic Gradient Descent (GGD) is a method that optimizes functions on Riemannian manifolds by following geodesics, thereby preserving manifold structure and leveraging curvature for faster convergence.
- By utilizing the exponential map and intrinsic metrics, GGD provides closed-form updates on special manifolds like hyperbolic spaces, spheres, and positive-definite matrices with provable convergence guarantees.
- GGD’s adaptability through variants such as hyperparameter-free, stochastic, and adaptive-step methods makes it a valuable tool for applications in manifold learning, geometric deep learning, and structured optimization.
Geodesic Gradient Descent (GGD) is a principled optimization method that generalizes classical gradient descent techniques to the setting of Riemannian manifolds. Instead of parameter updates along Euclidean straight lines, GGD progresses along geodesics determined by the manifold's intrinsic geometry and the Riemannian metric, thereby preserving manifold structure and exploiting curvature for accelerated convergence and increased robustness—particularly in problems exhibiting non-Euclidean, highly nonlinear, or structured parameter spaces (Wilson et al., 2018, Hu et al., 28 Feb 2026, Fioresi et al., 2019, Wensing et al., 2018).
1. Mathematical Foundations: Gradients and Geodesics
Given a smooth -dimensional Riemannian manifold , the Riemannian gradient of a smooth function is defined as the unique tangent vector at satisfying for all . In local coordinates (with metric tensor ), this yields , directly generalizing the Euclidean gradient (Wilson et al., 2018).
Updates are performed using the exponential map $\Exp_{x}(v)$, which transports the point 0 along the geodesic in direction 1 for unit time. The canonical GGD update is: 2 where 3 is the step size (Wensing et al., 2018). The exponential map is geometrically well-defined for geodesically complete manifolds; closed-form expressions are available on certain geometries (e.g., hyperbolic space and the sphere), while general manifolds may require retraction approximations (Wilson et al., 2018, Dong et al., 2022).
2. Algorithmic Realizations and Model-Specific Instances
GGD can be adapted to diverse manifold structures by substituting the relevant Riemannian metric and computing the exponential map accordingly.
Hyperboloid Model of Hyperbolic Space:
Within the hyperboloid model 4, with Minkowski metric 5, the GGD step involves:
- Computing the ambient (Minkowski) gradient, projecting to the tangent space 6, and applying the exponential map,
- Using closed-form updates involving hyperbolic trigonometric functions:
7
This yields stable, geometry-preserving steps with per-iteration complexity 8, substantially reducing the iteration count (e.g., ~46% fewer steps than Poincaré-ball retraction schemes for Fréchet mean problems) (Wilson et al., 2018).
Positive-Definite Matrix Manifolds:
For 9, the manifold of 0 Hermitian positive-definite matrices with the affine-invariant metric, geodesic updates follow: 1 with per-iteration complexity 2, required for matrix root and exponential computations. Applications include barycentre (Karcher mean) and control problems on matrix manifolds (Duan et al., 2019).
Sphere and Objective-Induced Geometry:
On 3, the exponential map is 4; GGD steps are projected along great circles. The method also generalizes via local spherical approximations around points on objective-induced hypersurfaces, resulting in parameter-free, adaptive-step GGD algorithms that do not require learning rate tuning (Hu et al., 28 Feb 2026).
3. Theoretical Guarantees and Convergence Analysis
Central to the analysis of GGD is the concept of geodesic convexity (g-convexity), where 5 is g-convex if
6
for all 7 and any geodesic 8 from 9 to 0. In the presence of 1-strong g-convexity and 2-smoothness, the following hold (Shu et al., 9 Apr 2025, Wensing et al., 2018):
- Linear convergence: 3, 4 with 5 and properly chosen 6,
- Sublinear 7 or 8 rates in non-strongly convex or stochastic settings.
Recent work replaces classical curvature and bounded-domain assumptions with the quasilinearization framework, permitting optimal convergence guarantees under substantially weaker assumptions—specifically, on arbitrary Hadamard manifolds (complete, simply connected, with nonpositive sectional curvature) (Shu et al., 9 Apr 2025).
4. Connections, Extensions, and Variants
GGD both generalizes and subsumes natural gradient and Riemannian gradient descent techniques. By selecting an appropriate Riemannian metric—e.g., the Fisher information, a Hessian metric, or one induced from a reference manifold via pullback—GGD becomes a mechanism for aligning descent with problem-specific geometry, accelerating convergence (notably in high-dimensional or ill-conditioned landscapes) (Dong et al., 2022, Fioresi et al., 2019).
Specialized variants include:
- Learning-rate-free GGD based on local sphere approximation and automatic step scaling (Hu et al., 28 Feb 2026).
- Adaptive step-size GGD for nonnegative curvature manifolds where local Lipschitzness is estimated via parallel transporting gradients between iterates, yielding 9 guarantees (Ansari-Önnestam et al., 23 Apr 2025).
- Stochastic GGD, incremental and mini-batch adaptations, and mirror-descent generalizations arise by appropriate choices of geodesics and connections, especially in information geometry and dually flat manifolds (Omiya et al., 10 Dec 2025).
Table: Selected GGD-Related Manifolds and Algorithms
| Manifold | Riemannian Exponential Map | Per-iteration Complexity |
|---|---|---|
| Hyperbolic (0) | 1 | 2 |
| Sphere (3) | 4 | 5 |
| 6 (SPD) | 7 | 8 |
5. Empirical Performance and Applications
Empirical studies demonstrate accelerated convergence and robustness for GGD over classical optimizers:
- On the Burgers' PDE benchmark (regression), test MSE reductions of 35.8–48.8% against Adam.
- On MNIST (classification), cross-entropy loss improvements of 3.1–11.6% and final test accuracy up to 99.3%, with only modest computational overhead compared to Adam. Performance gains are especially pronounced as model depth increases (Hu et al., 28 Feb 2026).
- On positive definite matrices and the Karcher mean, natural-gradient variants outperform standard Riemannian GD in convergence speed (Duan et al., 2019).
GGD is instrumental in manifold learning, geometric deep learning, statistical inference on structured spaces, and optimization on information geometric models (e.g., exponential families, dually flat spaces)—where an m-geodesic update can, for log-likelihood maximization, theoretically reach the MLE in a single step (Omiya et al., 10 Dec 2025).
6. Structural Advantages and Limitations
Advantages of GGD include:
- Geometry awareness: updates remain on manifold, respecting constraints and topology.
- Stability and robustness: exponential-map updates avoid overshooting and instability present in retraction-based or projection-based methods (Wilson et al., 2018).
- Generality and extensibility: applicable to arbitrary manifolds and adaptable to problem-induced geometry via metric design (Dong et al., 2022).
- Hyperparameter minimization: local sphere-based GGD implementations eliminate learning-rate tuning (Hu et al., 28 Feb 2026).
Limitations involve:
- Metric selection: constructing and efficiently inverting the metric may be challenging in large or unstructured models (Dong et al., 2022).
- Closed-form exponential maps exist only for certain manifolds; otherwise, reliance on numerical geodesic solvers or retractions is required.
- Guarantees, such as global optimality and convergence rate, still depend on establishing (strong) geodesic convexity, which may not hold in highly nonconvex settings (Shu et al., 9 Apr 2025).
7. Outlook and Open Problems
Current research directions include:
- Development of hyperparameter-free, locally adaptive step-size schemes for general geometry (Hu et al., 28 Feb 2026, Ansari-Önnestam et al., 23 Apr 2025).
- Automated metric learning and meta-optimization for accelerating GGD in complex landscapes (Dong et al., 2022).
- Integration of GGD with second-order, stochastic, or momentum-based Riemannian methods.
- Extensions to time-varying optimization, game-theoretic equilibria, and hierarchical manifold-structured problems (Wensing et al., 2018).
- Theoretical characterization of nonconvex convergence, especially on manifolds without sectional curvature bounds or under weak regularity.
GGD thus constitutes a unifying and theoretically rigorous optimization framework for manifold-valued and geometrically-structured optimization problems, with demonstrated practical efficacy and evolving theory across the spectrum of smooth, strongly convex, and stochastic regimes.