Papers
Topics
Authors
Recent
Search
2000 character limit reached

Riemannian Gradient Descent

Updated 7 June 2026
  • Riemannian gradient descent is a first-order optimization method that generalizes gradient descent to smooth manifolds using intrinsic geometric structures.
  • It leverages tangent spaces, Riemannian metrics, and retraction maps to ensure updates remain on the manifold, preserving complex constraints.
  • Its convergence properties, adaptive step-sizes, and robustness to curvature make it essential for applications in machine learning, robust statistics, and quantum computation.

Riemannian gradient descent is a class of first-order optimization methods that generalize standard gradient descent to smooth manifolds endowed with Riemannian structure, enabling unconstrained or constrained optimization where the feasible set forms a manifold rather than a linear space. By employing the geometry of the manifold—specifically, the tangent spaces, the Riemannian metric, and retraction or exponential maps—these methods define update rules intrinsically compatible with manifold constraints, and are essential across manifold machine learning, inverse problems, robust statistics, geometric deep learning, quantum computation, and control.

1. Basic Formulation and Principles

Let (M,g)(M,g) be a smooth Riemannian manifold with metric gg and tangent bundle TMTM. For a differentiable f:M→Rf:M\to\mathbb{R}, the Riemannian gradient $\grad f(x)\in T_xM$ at x∈Mx\in M is defined via

$D f(x)[v] = \langle \grad f(x), v\rangle_x, \quad \forall v\in T_xM.$

A classical Riemannian gradient descent (RGD) iteration with step size ηk>0\eta_k>0 takes the form

$x_{k+1} = \Retr_{x_k}(-\eta_k \grad f(x_k)),$

where $\Retr_{x_k}:T_{x_k}M\to M$ is a retraction map, usually chosen as the Riemannian exponential map when computationally feasible. In practice, retractions are selected for computational efficiency or numerical stability and approximate the exponential map to first (or higher) order.

The update is analogous to Euclidean GD but replaces vector addition with tangent vector moves mapped to the manifold by a geometric mechanism. The step gg0 is guaranteed to lie on gg1, preserving any manifold constraints such as orthonormality, fixed rank, or determinant, and almost always outperforms Euclidean projection in poorly conditioned or highly curved settings (Martínez-Rubio et al., 2024).

2. Geometric Structures and Computational Ingredients

The efficacy of Riemannian gradient descent rests on several geometric components:

  • Tangent Space and Projections: Each gg2 has a tangent space gg3; the Riemannian gradient is an intrinsic vector field and typically constructed by projecting (if gg4 is embedded) the Euclidean gradient onto gg5 (Bian et al., 2023, Knight, 1 Jun 2026).
  • Riemannian Metric: The inner product gg6 defines gradient directions and step sizes. For embedded submanifolds, the metric is usually induced from the ambient Euclidean or Hermitian space, but structure-adapted metrics may offer superior conditioning or preconditioning properties (Bian et al., 2023).
  • Retraction and Exponential Map: Retraction gg7 is a smooth local map satisfying gg8 and gg9; common choices are the exact exponential map, QR/polar-based retractions for Stiefel/Grassmannian, SVD truncation for low-rank matrix manifolds, or rowwise normalizations in product-of-spheres models (Wilson et al., 2018, Knight, 1 Jun 2026, Sutti et al., 2024).
  • Parallel Transport and Vector Transport: For non-Euclidean manifolds, it is sometimes necessary to compare tangent vectors at different points using parallel or isometric vector transport, especially in momentum or variance-reduced variants (Zhou et al., 2024).

These ingredients yield update rules that fully respect both the manifold structure and the problem's symmetries and invariances.

3. Convergence Theory, Rates, and Curvature Dependence

Theoretical properties of Riemannian gradient descent reflect a close analogy to Euclidean optimization, with modifications incorporating curvature and geodesic convexity:

  • Convexity and Smoothness: Geodesic (or TMTM0-) convexity generalizes Euclidean convexity; TMTM1 is TMTM2-convex if it is convex along all geodesics. TMTM3-smoothness and TMTM4-strong convexity are defined via geodesic distances (Martínez-Rubio et al., 2024, Ansari-Önnestam et al., 23 Apr 2025).
  • Convergence Rates: Sublinear rates TMTM5 for TMTM6-smooth TMTM7-convex TMTM8, and linear rates TMTM9 for strongly f:M→Rf:M\to\mathbb{R}0-convex f:M→Rf:M\to\mathbb{R}1, hold up to curvature-dependent scaling. The convergence rates degrade gracefully via geometric constants (e.g., f:M→Rf:M\to\mathbb{R}2 for sectional curvature lower-bounded by f:M→Rf:M\to\mathbb{R}3), and are provably robust when the manifold has bounded (possibly nonzero) curvature (Martínez-Rubio et al., 2024).
  • Iterate Boundedness: For provable guarantees, one must control that iterates remain in a convex ball around the minimizer; curvature-induced constants inflate this radius compared to the Euclidean case (Martínez-Rubio et al., 2024).
  • Adaptive and Inexact Variants: Adaptive step-size selection can be based on local Lipschitz estimates computed from parallel transports along geodesics, allowing larger steps in regions of low curvature or smoothness (Ansari-Önnestam et al., 23 Apr 2025). Inexact Riemannian gradient methods with controlled absolute or relative errors in the directional vector preserve stationarity and convergence guarantees under mild summability or boundedness conditions (Zhou et al., 2024).
  • Stochastic and Minimax Settings: Riemannian stochastic gradient descent (RSGD) and its convergence properties—e.g., variance reduction and weak-error SDE approximations—have been established (e.g. on Hadamard manifolds or for minimax optimization) with batch-size/variance trade-offs structurally homologous to Euclidean results, but with curvature- and geometry-dependent constants (Sakai et al., 2023, Gess et al., 2024, Huang et al., 2020).

4. Algorithmic Forms, Enhancements, and Representative Examples

Many algorithmic instantiations of Riemannian GD are available, tailored to specific manifold models and problem classes:

Manifold / Structure Retraction or Update Rule Application Domain
Stiefel/Grassmann f:M→Rf:M\to\mathbb{R}4 QR/polar, Exp map PCA, quantum chemistry (Dinvay, 16 Mar 2026), statistics
Fixed-rank / partial isometry SVD truncation, QR/sphere projection Deep learning QKV (Knight, 1 Jun 2026), low-rank recovery (Bian et al., 2023)
Positive definite Hermitian f:M→Rf:M\to\mathbb{R}5 Covariance averaging, matrix control (Duan et al., 2019)
Hyperbolic space f:M→Rf:M\to\mathbb{R}6 Ambient projection + hyperboloid Exp Barycenters, hierarchical representations (Wilson et al., 2018)
Product-of-spheres f:M→Rf:M\to\mathbb{R}7 Row normalization Area-preserving mapping, geom-registration (Sutti et al., 2024)
Rational transfer functions Orthographic (subspace) retraction Model order reduction (IRKA) (Mlinarić et al., 2023)

Optimizations may leverage momentum, preconditioning (Bian et al., 2023), adaptive step selection (Ansari-Önnestam et al., 23 Apr 2025), or variance reduction for stochastic settings (Huang et al., 2020). When the ambient dimension or curvature is high, randomized subspace approximations or quasi-Riemannian projections (e.g., one-mode tangent projections for tensors) provide computational tractability without sacrificing geometric fidelity (Zhang et al., 2024, Pervez et al., 15 Dec 2025).

Selected Algorithmic Themes

  • Preconditioned Riemannian GD: Diagonal or geometric preconditioners are constructed by local adaptation to the norm or energy of the gradient, yielding linear convergence under restricted isometry (matrix recovery) or dramatically improving wall-clock performance at large scale (Bian et al., 2023).
  • Stochastic RGDs and Riemannian SDEs: RSGD is weakly approximated by deterministic geodesic flows at f:M→Rf:M\to\mathbb{R}8 and by a second-order diffusion (Riemannian stochastic modified flow, RSMF) at f:M→Rf:M\to\mathbb{R}9, capturing fluctuations of stochastic or minibatch gradients (Gess et al., 2024).
  • Manifold-specific curvature exploitation: In negative curvature (Hadamard) manifolds, strict convexity of squared distance yields unique minimizers and strong convergence for averaging (e.g., Fréchet mean) (Sakai et al., 2023, Wilson et al., 2018). Adaptive methods in nonnegative curvature enjoy global O$\grad f(x)\in T_xM$0 best-iterate rates even without precise knowledge of global smoothness (Ansari-Önnestam et al., 23 Apr 2025).
  • Quantum and Infinite-Dimensional Settings: Gradient flows on quantum state manifolds employ group-theoretic projections and unitary retractions; infinite-dimensional Hartree–Fock problems are solved via Stiefel manifold optimization in Sobolev space with physically-motivated preconditioning (Dinvay, 16 Mar 2026, Pervez et al., 15 Dec 2025).

5. Application Domains and Empirical Results

Riemannian gradient descent is used in a broad array of high-impact domains:

  • Low-rank Matrix and Tensor Recovery: Accurate and robust estimation in signal processing, imaging, and completion, with global-linear or nearly dimension-free convergence from random initialization under weak isometry conditions (Hou et al., 2020, Bian et al., 2023, Zhang et al., 2024).
  • Deep Learning and Robust Optimization: Training of neural networks with orthogonality, low-rank, or parameter-sharing constraints, where RGD variants offer improved geometric fidelity and adversarial robustness compared to projected Euclidean methods (Knight, 1 Jun 2026, Huang et al., 2020).
  • Matrix Manifolds in Control and Model Reduction: Optimization over $\grad f(x)\in T_xM$1-manifolds or positive definite matrices for control, reduced-order modeling, and system identification (Mlinarić et al., 2023, Duan et al., 2019).
  • Manifold Averaging and Statistics: Computation of Karcher or Fréchet means in non-Euclidean geometry (e.g., SPD, hyperbolic), with unique minimizers and globally convergent RGD schemes (Duan et al., 2019, Wilson et al., 2018).
  • Quantum Algorithms: Ground-state preparation, where the structure-exploiting RGD delivers favorable scaling with problem size and allows scalable approximations via random subspaces (Pervez et al., 15 Dec 2025).
  • Computational Anatomy and Differential Geometry: Spherical area-preserving parameterizations and brain-surface registration, leveraging power-manifold RGD and guaranteed global convergence (Sutti et al., 2024).

Empirically, RGD methods consistently outperform naive projection-based approaches, especially in high curvature or constraint-dense settings, and exhibit improved sample and iteration complexity in adversarial, distributionally robust, and high-dimensional problems (Huang et al., 2020, Knight, 1 Jun 2026). Preconditioned and adaptive RGD are orders of magnitude faster in practical large-scale inference and machine learning (Bian et al., 2023).

6. Extensions: Minimax, Stochastic, and Inexact Models

Beyond unconstrained minimization, Riemannian GD has been adapted to:

  • Minimax Optimization on Manifolds: For problems $\grad f(x)\in T_xM$2 where $\grad f(x)\in T_xM$3 may be geodesically nonconvex in $\grad f(x)\in T_xM$4 but strongly concave in $\grad f(x)\in T_xM$5, Riemannian Gradient Descent Ascent (RGDA) and its stochastic and accelerated variants achieve sample complexities matching Euclidean GDA up to curvature-dependent condition numbers (Huang et al., 2020). Momentum and variance-reduced techniques—such as STORM—yield accelerated rates $\grad f(x)\in T_xM$6 for $\grad f(x)\in T_xM$7-stationarity.
  • Stochastic and Batch-Size Trade-Offs: On Hadamard manifolds, the iteration and sample complexities trade off with batch size similarly to Euclidean SGD, but curvature and unique geodesicity are key to establishing convexity and rate bounds (Sakai et al., 2023).
  • Inexact and Extragradient Methods: Riemannian inexact GD methods control for gradient inaccuracy via normed balls (C1) or cones (C2) around the true gradient, preserving convergence under KL conditions and supporting sharpness-aware minimization and extragradient variants (Zhou et al., 2024). Numerical evidence suggests that controlled inexactness does not degrade convergence in common machine learning models.

7. Representative Pseudocode and Workflow

A unified pseudocode pattern encapsulates most Riemannian GD algorithms (Martínez-Rubio et al., 2024, Knight, 1 Jun 2026, Bian et al., 2023, Wilson et al., 2018):

$\grad f(x)\in T_xM$8

Variants for specific manifolds substitute in efficient geometry-specific formulas for computing gradients, projections, and retraction, and may incorporate stochasticity, adaptive step schemes, or preconditioning.

References

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Riemannian Gradient Descent.