Riemannian Gradient Descent
- Riemannian gradient descent is a first-order optimization method that generalizes gradient descent to smooth manifolds using intrinsic geometric structures.
- It leverages tangent spaces, Riemannian metrics, and retraction maps to ensure updates remain on the manifold, preserving complex constraints.
- Its convergence properties, adaptive step-sizes, and robustness to curvature make it essential for applications in machine learning, robust statistics, and quantum computation.
Riemannian gradient descent is a class of first-order optimization methods that generalize standard gradient descent to smooth manifolds endowed with Riemannian structure, enabling unconstrained or constrained optimization where the feasible set forms a manifold rather than a linear space. By employing the geometry of the manifold—specifically, the tangent spaces, the Riemannian metric, and retraction or exponential maps—these methods define update rules intrinsically compatible with manifold constraints, and are essential across manifold machine learning, inverse problems, robust statistics, geometric deep learning, quantum computation, and control.
1. Basic Formulation and Principles
Let be a smooth Riemannian manifold with metric and tangent bundle . For a differentiable , the Riemannian gradient $\grad f(x)\in T_xM$ at is defined via
$D f(x)[v] = \langle \grad f(x), v\rangle_x, \quad \forall v\in T_xM.$
A classical Riemannian gradient descent (RGD) iteration with step size takes the form
$x_{k+1} = \Retr_{x_k}(-\eta_k \grad f(x_k)),$
where $\Retr_{x_k}:T_{x_k}M\to M$ is a retraction map, usually chosen as the Riemannian exponential map when computationally feasible. In practice, retractions are selected for computational efficiency or numerical stability and approximate the exponential map to first (or higher) order.
The update is analogous to Euclidean GD but replaces vector addition with tangent vector moves mapped to the manifold by a geometric mechanism. The step 0 is guaranteed to lie on 1, preserving any manifold constraints such as orthonormality, fixed rank, or determinant, and almost always outperforms Euclidean projection in poorly conditioned or highly curved settings (MartÃnez-Rubio et al., 2024).
2. Geometric Structures and Computational Ingredients
The efficacy of Riemannian gradient descent rests on several geometric components:
- Tangent Space and Projections: Each 2 has a tangent space 3; the Riemannian gradient is an intrinsic vector field and typically constructed by projecting (if 4 is embedded) the Euclidean gradient onto 5 (Bian et al., 2023, Knight, 1 Jun 2026).
- Riemannian Metric: The inner product 6 defines gradient directions and step sizes. For embedded submanifolds, the metric is usually induced from the ambient Euclidean or Hermitian space, but structure-adapted metrics may offer superior conditioning or preconditioning properties (Bian et al., 2023).
- Retraction and Exponential Map: Retraction 7 is a smooth local map satisfying 8 and 9; common choices are the exact exponential map, QR/polar-based retractions for Stiefel/Grassmannian, SVD truncation for low-rank matrix manifolds, or rowwise normalizations in product-of-spheres models (Wilson et al., 2018, Knight, 1 Jun 2026, Sutti et al., 2024).
- Parallel Transport and Vector Transport: For non-Euclidean manifolds, it is sometimes necessary to compare tangent vectors at different points using parallel or isometric vector transport, especially in momentum or variance-reduced variants (Zhou et al., 2024).
These ingredients yield update rules that fully respect both the manifold structure and the problem's symmetries and invariances.
3. Convergence Theory, Rates, and Curvature Dependence
Theoretical properties of Riemannian gradient descent reflect a close analogy to Euclidean optimization, with modifications incorporating curvature and geodesic convexity:
- Convexity and Smoothness: Geodesic (or 0-) convexity generalizes Euclidean convexity; 1 is 2-convex if it is convex along all geodesics. 3-smoothness and 4-strong convexity are defined via geodesic distances (MartÃnez-Rubio et al., 2024, Ansari-Önnestam et al., 23 Apr 2025).
- Convergence Rates: Sublinear rates 5 for 6-smooth 7-convex 8, and linear rates 9 for strongly 0-convex 1, hold up to curvature-dependent scaling. The convergence rates degrade gracefully via geometric constants (e.g., 2 for sectional curvature lower-bounded by 3), and are provably robust when the manifold has bounded (possibly nonzero) curvature (MartÃnez-Rubio et al., 2024).
- Iterate Boundedness: For provable guarantees, one must control that iterates remain in a convex ball around the minimizer; curvature-induced constants inflate this radius compared to the Euclidean case (MartÃnez-Rubio et al., 2024).
- Adaptive and Inexact Variants: Adaptive step-size selection can be based on local Lipschitz estimates computed from parallel transports along geodesics, allowing larger steps in regions of low curvature or smoothness (Ansari-Önnestam et al., 23 Apr 2025). Inexact Riemannian gradient methods with controlled absolute or relative errors in the directional vector preserve stationarity and convergence guarantees under mild summability or boundedness conditions (Zhou et al., 2024).
- Stochastic and Minimax Settings: Riemannian stochastic gradient descent (RSGD) and its convergence properties—e.g., variance reduction and weak-error SDE approximations—have been established (e.g. on Hadamard manifolds or for minimax optimization) with batch-size/variance trade-offs structurally homologous to Euclidean results, but with curvature- and geometry-dependent constants (Sakai et al., 2023, Gess et al., 2024, Huang et al., 2020).
4. Algorithmic Forms, Enhancements, and Representative Examples
Many algorithmic instantiations of Riemannian GD are available, tailored to specific manifold models and problem classes:
| Manifold / Structure | Retraction or Update Rule | Application Domain |
|---|---|---|
| Stiefel/Grassmann 4 | QR/polar, Exp map | PCA, quantum chemistry (Dinvay, 16 Mar 2026), statistics |
| Fixed-rank / partial isometry | SVD truncation, QR/sphere projection | Deep learning QKV (Knight, 1 Jun 2026), low-rank recovery (Bian et al., 2023) |
| Positive definite Hermitian | 5 | Covariance averaging, matrix control (Duan et al., 2019) |
| Hyperbolic space 6 | Ambient projection + hyperboloid Exp | Barycenters, hierarchical representations (Wilson et al., 2018) |
| Product-of-spheres 7 | Row normalization | Area-preserving mapping, geom-registration (Sutti et al., 2024) |
| Rational transfer functions | Orthographic (subspace) retraction | Model order reduction (IRKA) (Mlinarić et al., 2023) |
Optimizations may leverage momentum, preconditioning (Bian et al., 2023), adaptive step selection (Ansari-Önnestam et al., 23 Apr 2025), or variance reduction for stochastic settings (Huang et al., 2020). When the ambient dimension or curvature is high, randomized subspace approximations or quasi-Riemannian projections (e.g., one-mode tangent projections for tensors) provide computational tractability without sacrificing geometric fidelity (Zhang et al., 2024, Pervez et al., 15 Dec 2025).
Selected Algorithmic Themes
- Preconditioned Riemannian GD: Diagonal or geometric preconditioners are constructed by local adaptation to the norm or energy of the gradient, yielding linear convergence under restricted isometry (matrix recovery) or dramatically improving wall-clock performance at large scale (Bian et al., 2023).
- Stochastic RGDs and Riemannian SDEs: RSGD is weakly approximated by deterministic geodesic flows at 8 and by a second-order diffusion (Riemannian stochastic modified flow, RSMF) at 9, capturing fluctuations of stochastic or minibatch gradients (Gess et al., 2024).
- Manifold-specific curvature exploitation: In negative curvature (Hadamard) manifolds, strict convexity of squared distance yields unique minimizers and strong convergence for averaging (e.g., Fréchet mean) (Sakai et al., 2023, Wilson et al., 2018). Adaptive methods in nonnegative curvature enjoy global O$\grad f(x)\in T_xM$0 best-iterate rates even without precise knowledge of global smoothness (Ansari-Önnestam et al., 23 Apr 2025).
- Quantum and Infinite-Dimensional Settings: Gradient flows on quantum state manifolds employ group-theoretic projections and unitary retractions; infinite-dimensional Hartree–Fock problems are solved via Stiefel manifold optimization in Sobolev space with physically-motivated preconditioning (Dinvay, 16 Mar 2026, Pervez et al., 15 Dec 2025).
5. Application Domains and Empirical Results
Riemannian gradient descent is used in a broad array of high-impact domains:
- Low-rank Matrix and Tensor Recovery: Accurate and robust estimation in signal processing, imaging, and completion, with global-linear or nearly dimension-free convergence from random initialization under weak isometry conditions (Hou et al., 2020, Bian et al., 2023, Zhang et al., 2024).
- Deep Learning and Robust Optimization: Training of neural networks with orthogonality, low-rank, or parameter-sharing constraints, where RGD variants offer improved geometric fidelity and adversarial robustness compared to projected Euclidean methods (Knight, 1 Jun 2026, Huang et al., 2020).
- Matrix Manifolds in Control and Model Reduction: Optimization over $\grad f(x)\in T_xM$1-manifolds or positive definite matrices for control, reduced-order modeling, and system identification (Mlinarić et al., 2023, Duan et al., 2019).
- Manifold Averaging and Statistics: Computation of Karcher or Fréchet means in non-Euclidean geometry (e.g., SPD, hyperbolic), with unique minimizers and globally convergent RGD schemes (Duan et al., 2019, Wilson et al., 2018).
- Quantum Algorithms: Ground-state preparation, where the structure-exploiting RGD delivers favorable scaling with problem size and allows scalable approximations via random subspaces (Pervez et al., 15 Dec 2025).
- Computational Anatomy and Differential Geometry: Spherical area-preserving parameterizations and brain-surface registration, leveraging power-manifold RGD and guaranteed global convergence (Sutti et al., 2024).
Empirically, RGD methods consistently outperform naive projection-based approaches, especially in high curvature or constraint-dense settings, and exhibit improved sample and iteration complexity in adversarial, distributionally robust, and high-dimensional problems (Huang et al., 2020, Knight, 1 Jun 2026). Preconditioned and adaptive RGD are orders of magnitude faster in practical large-scale inference and machine learning (Bian et al., 2023).
6. Extensions: Minimax, Stochastic, and Inexact Models
Beyond unconstrained minimization, Riemannian GD has been adapted to:
- Minimax Optimization on Manifolds: For problems $\grad f(x)\in T_xM$2 where $\grad f(x)\in T_xM$3 may be geodesically nonconvex in $\grad f(x)\in T_xM$4 but strongly concave in $\grad f(x)\in T_xM$5, Riemannian Gradient Descent Ascent (RGDA) and its stochastic and accelerated variants achieve sample complexities matching Euclidean GDA up to curvature-dependent condition numbers (Huang et al., 2020). Momentum and variance-reduced techniques—such as STORM—yield accelerated rates $\grad f(x)\in T_xM$6 for $\grad f(x)\in T_xM$7-stationarity.
- Stochastic and Batch-Size Trade-Offs: On Hadamard manifolds, the iteration and sample complexities trade off with batch size similarly to Euclidean SGD, but curvature and unique geodesicity are key to establishing convexity and rate bounds (Sakai et al., 2023).
- Inexact and Extragradient Methods: Riemannian inexact GD methods control for gradient inaccuracy via normed balls (C1) or cones (C2) around the true gradient, preserving convergence under KL conditions and supporting sharpness-aware minimization and extragradient variants (Zhou et al., 2024). Numerical evidence suggests that controlled inexactness does not degrade convergence in common machine learning models.
7. Representative Pseudocode and Workflow
A unified pseudocode pattern encapsulates most Riemannian GD algorithms (MartÃnez-Rubio et al., 2024, Knight, 1 Jun 2026, Bian et al., 2023, Wilson et al., 2018):
$\grad f(x)\in T_xM$8
Variants for specific manifolds substitute in efficient geometry-specific formulas for computing gradients, projections, and retraction, and may incorporate stochasticity, adaptive step schemes, or preconditioning.
References
- "Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point" (MartÃnez-Rubio et al., 2024)
- "Riemannian Gradient Descent for Low-Rank Architectures" (Knight, 1 Jun 2026)
- "A Preconditioned Riemannian Gradient Descent Algorithm for Low-Rank Matrix Recovery" (Bian et al., 2023)
- "Gradient descent in hyperbolic space" (Wilson et al., 2018)
- "Adaptive Gradient Descent on Riemannian Manifolds with Nonnegative Curvature" (Ansari-Önnestam et al., 23 Apr 2025)
- "Inexact Riemannian Gradient Descent Method for Nonconvex Optimization" (Zhou et al., 2024)
- "Riemannian gradient descent for Hartree-Fock theory" (Dinvay, 16 Mar 2026)
- "Riemannian gradient descent-based quantum algorithms for ground state preparation with guarantees" (Pervez et al., 15 Dec 2025)
- "Gradient Descent Ascent for Minimax Problems on Riemannian Manifolds" (Huang et al., 2020)
- "Convergence of Riemannian Stochastic Gradient Descent on Hadamard Manifold" (Sakai et al., 2023)
- "Stochastic Modified Flows for Riemannian Stochastic Gradient Descent" (Gess et al., 2024)
- "Application of gradient descent algorithms based on geodesic distances" (Duan et al., 2019)
- "IRKA is a Riemannian Gradient Descent Method" (Mlinarić et al., 2023)
- "Riemannian gradient descent for spherical area-preserving mappings" (Sutti et al., 2024)
- "Fast Global Convergence for Low-rank Matrix Recovery via Riemannian Gradient Descent with Random Initialization" (Hou et al., 2020)
- "A Single-Mode Quasi Riemannian Gradient Descent Algorithm for Low-Rank Tensor Recovery" (Zhang et al., 2024)