Riemannian Gradient Descent (RGD)
- Riemannian Gradient Descent is an extension of gradient-based methods to curved manifolds using geodesic or retraction updates.
- It intrinsically enforces constraints (e.g., orthogonality, fixed-rank) and is applied in low-rank recovery, quantum information, and geometry-aware learning.
- Robust convergence theories and algorithmic variants like adaptive step-size and preconditioning make RGD effective for both convex and nonconvex optimization settings.
Riemannian Gradient Descent (RGD) is the canonical extension of gradient-based first-order optimization methods from Euclidean spaces to general Riemannian manifolds. Given a manifold equipped with a Riemannian metric, RGD iteratively updates its iterate by following the negative Riemannian gradient direction along geodesics or retractions, thereby generalizing tangent steps to curved spaces and enforcing constraints (e.g., orthogonality, fixed-rank, positive-definiteness) intrinsically. RGD has found wide application in low-rank recovery, matrix sensing, tensor decomposition, learning on non-Euclidean domains, quantum information, and geometry-aware signal processing, and is the foundation for accelerated, stochastic, preconditioned, and inexact manifold optimization algorithms.
1. Mathematical Formulation of RGD
Let be a Riemannian manifold with metric on and geodesic flow $\Exp_x(v)$. For a smooth objective , the Riemannian gradient is the tangent vector at satisfying for all . The basic RGD step with stepsize is
$x_{k+1} = \Exp_{x_k}\left(-\eta\,\mathrm{grad}\,f(x_k)\right)$
where $\Exp_{x_k}$ is the exponential map. When explicit geodesics are unavailable or expensive, a retraction that locally approximates $\Exp_{x_k}$ to first order is employed: Backtracking or Armijo-type line-searches are frequently used to adaptively determine for sufficient decrease in (Truong, 2020, Muşat et al., 18 Jul 2025).
2. Geometry, Gradients, and Retractions in Applied Settings
RGD critically depends on the explicit geometry of the optimization domain:
- Hyperbolic Space: On the hyperboloid model , the Riemannian gradient is computed by projecting the Minkowski (ambient) gradient onto the tangent space, and geodesic updates use closed-form exponential maps based on hyperbolic trigonometric functions (Wilson et al., 2018).
- Positive-definite Matrices : For the affine-invariant metric, the Riemannian gradient of geodesic squared distance is , with exponential-map updates expressed as geometric means (Duan et al., 2019).
- Fixed-rank Matrix and Tensor Manifolds: Tangent spaces and projections for matrix and tensor varieties are characterized via SVD (or CP-ALS/HOSVD for tensors), with truncated SVD-based retractions preserving intrinsic rank constraints (Xiang et al., 11 Oct 2024, Bian et al., 2023, Hsu et al., 2022, Xu et al., 1 Oct 2025, Dong et al., 2022).
- Information Geometry: On with the Fisher–Rao metric, the exponentiated gradient (EG) is a special case of RGD using the -exponential retraction, yielding the multiplicative update (Elshiaty et al., 7 Apr 2025).
- Quantum State Optimization: On projective Hilbert spaces, the RGD flow for quantum ground state preparation is realized as a geodesic flow driven by the commutator , with updates parameterized via the Lie algebra of (Pervez et al., 15 Dec 2025).
3. Convergence Theories and Complexity
RGD exhibits convergence guarantees under a variety of geometric and smoothness assumptions:
- Convex/Geodesically Convex (g-convex) Settings: On manifolds with nonnegative or nonpositive curvature, RGD with appropriately chosen stepsize satisfies characteristic of first-order methods for smooth convex objectives (Ansari-Önnestam et al., 23 Apr 2025, Duan et al., 2019).
- Nonconvex and Saddle-avoiding Regimes: For objectives, RGD with Armijo or fixed stepsize avoids strict saddles almost everywhere (i.e., the set of initializations converging to non-minimal saddles has measure zero), under real-analytic (or with KŁ-property) assumptions (Truong, 2020, Muşat et al., 18 Jul 2025).
- Strong Monotonicity and Linear Convergence: In geodesically strongly convex or restricted positive definite settings (low-rank recovery, CP tensor, matrix completion), RGD achieves local or global linear convergence, with rates dictated by metric conditioning, manifold curvature, or problem-specific incoherence (Xu et al., 1 Oct 2025, Bian et al., 2023, Dong et al., 2022, Xiang et al., 11 Oct 2024, Hsu et al., 2022).
- Iteration Complexity: For -smooth objectives, RGD achieves -stationarity in iterations, matching classical complexity (Li et al., 5 May 2024). Under favorable geometric structures (e.g., information geometry, low-rank quantum tomography), contraction factors may be independent of condition number or curvature (Hsu et al., 2022, Elshiaty et al., 7 Apr 2025).
4. Algorithmic Variants: Inexactness, Adaptivity, and Preconditioning
Inexact RGD
Several works formalize Riemannian Gradient Descent with inexact (approximate or noisy) gradient and retraction computations:
- Absolute Error: .
- Relative Error: for some .
- Convergence: Under suitable summability or contraction of errors, these inexact schemes recover the same stationary limit points, descent, and complexity as exact RGD, with iteration complexity to reach -stationarity (Zhou et al., 17 Sep 2024, Talwar et al., 7 Jul 2025, Li et al., 5 May 2024).
Adaptive and Preconditioned RGD
- Adaptive Step-size: Strategies that adapt steps based on local gradient changes (e.g., inverse-Lipschitz estimation from parallel transport of the gradient and previous displacement) yield faster empirical and theoretical rates, minimize expensive line-search calls, and allow larger steps without global estimation (Ansari-Önnestam et al., 23 Apr 2025).
- Preconditioning: Modifications of the Riemannian metric and tangent-space computations (e.g., entry-wise reweighting by gradient norms for low-rank matrix problems) can accelerate convergence and reduce iteration counts by orders of magnitude with negligible per-iteration overhead (Bian et al., 2023, Dong et al., 2022, Xiang et al., 11 Oct 2024).
5. Applications and Illustrative Problems
RGD underpins numerous applications in modern computational mathematics and data science:
| Domain | Manifold/Structure | Example Application |
|---|---|---|
| Low-rank recovery | Fixed-rank matrix/tensor | Matrix completion, phase retrieval, blind super-resolution |
| Quantum information | Complex projective, density | Quantum ground-state preparation, quantum tomography |
| Information geometry | Positive orthant, simplex | Poisson inverse problems, KL-divergence minimization |
| Geometry-aware learning | Hyperbolic, Grassmann, Stiefel | Fréchet means, dimensionality reduction, neural networks |
| Model reduction | Transfer function manifolds | optimal model order reduction, IRKA |
| Covariance estimation | Positive-definite Hermitian | Karcher mean, covariance matrix control, linear system identification |
Concrete studies display RGD’s efficiency and accuracy in estimating Fréchet means in hyperbolic space (Wilson et al., 2018), computing Karcher means on (Duan et al., 2019), super-resolving signals under multi-user interference (Xiang et al., 11 Oct 2024), iteratively refining reduced-order models on rational function manifolds (Mlinarić et al., 2023), and quantum state tomography with accelerated contraction (Hsu et al., 2022).
6. Backtracking, Saddle Avoidance, and Algorithmic Frameworks
Backtracking line-search (Armijo rule) is integral to RGD on manifolds without needing explicit smoothness constants, ensuring sufficient decrease and automatic step-size adaptation (Truong, 2020, Muşat et al., 18 Jul 2025). Recent theoretical advances guarantee strict saddle avoidance under generic initialization and stepsize, both for fixed- and variable-step RGD, via measure-level dynamical systems arguments and the Center-Stable Manifold Theorem (Muşat et al., 18 Jul 2025). Comprehensive frameworks such as tangential Block Majorization-Minimization (tBMM) subsume RGD as a special case and provide generalization to composite and block-constrained objectives (Li et al., 5 May 2024).
7. Comparisons, Extensions, and Limitations
- Versus Euclidean GD: RGD enforces manifold constraints intrinsically, eliminating the need for projections, and respects curvature- and metric-induced geometry, enabling both local and global convergence guarantees, together with avoidance of factorization “balancing” requirements (Dong et al., 2022).
- Versus Mirror Descent, Interior Point: When endowed with appropriate Riemannian metrics (e.g., Fisher-Rao), classical exponentiated/mirror algorithms are particular instances of RGD; but RGD also subsumes non-Euclidean geometries, curved spaces, and manifold-valued optimization (Elshiaty et al., 7 Apr 2025).
- Scalability and Per-Iteration Cost: For most matrix and tensor manifolds, per-iteration cost is dominated by low-rank SVD/eigendecomposition, or structure-exploiting projections; use of retractions (e.g., truncated SVD) and metric preconditioning optimizes efficiency (Xiang et al., 11 Oct 2024, Bian et al., 2023, Hsu et al., 2022).
- Inexact and Stochastic Settings: Extension to inexact, stochastic, and variance-reduced settings (e.g., stochastic RSGDA, adaptive Karcher, and extragradient methods) demonstrates that robustness and convergence are not compromised by moderate gradient or retraction approximation, provided error control conditions are observed (Zhou et al., 17 Sep 2024, Talwar et al., 7 Jul 2025, Huang et al., 2020).
In summary, Riemannian Gradient Descent unifies and generalizes first-order optimization in the presence of smooth manifold constraints, with robust convergence theory, modular algorithmic design (exact/inexact, fixed/variable step, adaptive, preconditioned), and extensive application in geometric, structural, and physics-informed machine learning, signal processing, and quantum computation.