Riemannian/Projected Gradient Descent

Updated 17 June 2026

Riemannian/Projected Gradient Descent is an optimization framework that extends classical steepest descent to curved spaces by leveraging Riemannian metrics and geometric projections.
It uses exponential maps or efficient retractions for Riemannian updates and nearest-point projections for constrained sets, maintaining feasibility throughout the iterations.
The methods achieve provable convergence properties under curvature and convexity assumptions, with applications ranging from quantum information to low-rank matrix optimization.

Riemannian and projected gradient descent (GD) are fundamental algorithmic paradigms for optimization over manifolds, submanifolds, and nonlinear constraint sets. Their core principle is to generalize classical steepest-descent to spaces with nontrivial geometry by replacing Euclidean operations with their Riemannian or geometric analogues, ensuring that the optimization trajectory remains on the feasible set and that the update directions account for curvature and constraint-induced structure.

1. Geometric Foundations and Algorithmic Variants

Riemannian gradient descent (RGD) operates on smooth manifolds $\mathcal{M}$ endowed with a Riemannian metric $\langle\cdot,\cdot\rangle_x$ on each tangent space $T_x \mathcal{M}$ . At each iteration, the algorithm computes the Riemannian gradient $\operatorname{grad} f(x) \in T_x \mathcal{M}$ , which satisfies

$\langle \operatorname{grad} f(x), v \rangle_x = Df(x)[v],\quad \forall v \in T_x \mathcal{M}.$

The basic geometric update is the exponential map (or an efficient retraction $\mathcal{R}_x$ ): $x_{k+1} = \operatorname{Exp}_x(-\eta_k \operatorname{grad} f(x_k)), \quad\text{or}\quad x_{k+1} = \mathcal{R}_x(-\eta_k \operatorname{grad} f(x_k)).$ Projected gradient descent (PGD) and Projected Riemannian Gradient Descent (PRGD) generalize this scheme to embedded submanifolds or constraint sets, relying on orthogonal projections or nearest-point projections after a tangent-space step. If $M \subset \mathbb{R}^n$ is a compact $C^2$ submanifold, the projection is

$\operatorname{Proj}_M(u) = \arg\min_{y \in M} \|y - u\|,$

and the projected update is

$\langle\cdot,\cdot\rangle_x$ 0

For submanifolds, the Riemannian gradient may be implemented as the metric projection of the Euclidean gradient onto the tangent space $\langle\cdot,\cdot\rangle_x$ 1,

$\langle\cdot,\cdot\rangle_x$ 2

with subsequent manifold retraction or nearest-point projection.

Table: Core Variants

Method	Feasible Set	Descent Step	Return to Manifold
RGD	Riemannian $\langle\cdot,\cdot\rangle_x$ 3	$\langle\cdot,\cdot\rangle_x$ 4	Exponential / Retraction
PGD	$\langle\cdot,\cdot\rangle_x$ 5, $\langle\cdot,\cdot\rangle_x$ 6 (convex set or submanifold)	$\langle\cdot,\cdot\rangle_x$ 7	Metric projection
PRGD	Submanifold $\langle\cdot,\cdot\rangle_x$ 8	$\langle\cdot,\cdot\rangle_x$ 9	Manifold projection

2. Differential Structure, Projections, and Natural Metrics

A unifying principle is the replacement of Euclidean notions with those induced by the manifold's metric or a suitable pull-back metric, as in natural gradient descent (NGD). For parameterized constraints or ansatzes of the form $T_x \mathcal{M}$ 0 with $T_x \mathcal{M}$ 1, $T_x \mathcal{M}$ 2, one can induce a Riemannian metric on $T_x \mathcal{M}$ 3 by pull-back of a reference metric $T_x \mathcal{M}$ 4 on $T_x \mathcal{M}$ 5: $T_x \mathcal{M}$ 6 The generalized NGD update is then

$T_x \mathcal{M}$ 7

In constrained scenarios, projection onto the admissible tangent space arises either via explicit orthogonal projection or implicitly via the induced metric and the Jacobian of the embedding $T_x \mathcal{M}$ 8. If the full ambient tangent space is not spanned, the projected update in $T_x \mathcal{M}$ 9 takes the form

$\operatorname{grad} f(x) \in T_x \mathcal{M}$ 0

with $\operatorname{grad} f(x) \in T_x \mathcal{M}$ 1 denoting the orthogonal projection onto the image subspace, ensuring that the iterate remains on the submanifold $\operatorname{grad} f(x) \in T_x \mathcal{M}$ 2 (Dong et al., 2022).

3. Convergence Properties and Algorithmic Guarantees

Local Linear and Sublinear Rates

Under standard Riemannian smoothness and convexity assumptions (e.g., geodesic convexity, sectional curvature bounds), RGD and its projected variants enjoy well-characterized convergence rates:

For $\operatorname{grad} f(x) \in T_x \mathcal{M}$ 3-smooth, $\operatorname{grad} f(x) \in T_x \mathcal{M}$ 4-strongly geodesically convex functions, RGD with constant step-size $\operatorname{grad} f(x) \in T_x \mathcal{M}$ 5 achieves linear convergence:

$\operatorname{grad} f(x) \in T_x \mathcal{M}$ 6

where $\operatorname{grad} f(x) \in T_x \mathcal{M}$ 7 and the rates incorporate curvature-dependent constants such as $\operatorname{grad} f(x) \in T_x \mathcal{M}$ 8 (Martínez-Rubio et al., 2024).

In the absence of strong convexity, sublinear $\operatorname{grad} f(x) \in T_x \mathcal{M}$ 9 rates in function value gap are attainable.
In nonconvex settings, under $\langle \operatorname{grad} f(x), v \rangle_x = Df(x)[v],\quad \forall v \in T_x \mathcal{M}.$ 0-smoothness and the Kurdyka–Łojasiewicz property, global convergence of RGD (and IRGD) can be established, with convergence rates depending on the local KL exponent (Zhou et al., 2024).

Projected Riemannian GD and decentralized algorithms on compact submanifolds attain comparable rates, with $\langle \operatorname{grad} f(x), v \rangle_x = Df(x)[v],\quad \forall v \in T_x \mathcal{M}.$ 1 and $\langle \operatorname{grad} f(x), v \rangle_x = Df(x)[v],\quad \forall v \in T_x \mathcal{M}.$ 2 rates to stationarity for standard and gradient-tracking schemes, respectively (Deng et al., 2023). On Hadamard manifolds with nonpositive curvature, decentralized projected RGD achieves minimax-optimal dynamic regret of order $\langle \operatorname{grad} f(x), v \rangle_x = Df(x)[v],\quad \forall v \in T_x \mathcal{M}.$ 3, where $\langle \operatorname{grad} f(x), v \rangle_x = Df(x)[v],\quad \forall v \in T_x \mathcal{M}.$ 4 is path variation and $\langle \operatorname{grad} f(x), v \rangle_x = Df(x)[v],\quad \forall v \in T_x \mathcal{M}.$ 5 the spectral gap of the consensus matrix (Chen et al., 2024).

Trade-offs and Implementation

The explicit choice of metric (e.g., Fisher, Sobolev, pull-back via function-space structure) directly impacts both convergence rate and numerical stability, often dramatically accelerating optimization in ill-conditioned, constraint-rich, or physically structured models (Dong et al., 2022, Bao et al., 12 Dec 2025). Step-size schedules can be constant, diminishing, or adaptive, and line searches or Armijo-type rules are frequently employed to guarantee monotonic descent or enforce stability constraints (Mlinarić et al., 2023).

4. Practical Implementations and Applications

Riemannian and projected GD frameworks have wide-ranging applications:

Quantum information: RGD on the Stiefel or unitary group manifolds is used in quantum process tomography and ground-state preparation, enabling physical trace-preservation and unitarity constraints to be exactly enforced, with Riemannian retractions (e.g., Cayley transforms, matrix exponentials) maintaining feasibility at each step (Volya et al., 2024, Pervez et al., 15 Dec 2025).
Matrix manifolds: Low-rank matrix optimization is treated by RGD on fixed-rank and partial isometry manifolds, with tangent-space projections and SVD-based retractions yielding local linear or accelerated rates (Knight, 1 Jun 2026, Li et al., 2022).
Variational ansätze and physical constraints: Pull-back metrics (Sobolev, energy-based) are used to precondition optimization in neural variational Monte Carlo and variational quantum eigensolvers (Bao et al., 12 Dec 2025).
Decentralized optimization: DPRGD and its variants are implementable in multi-agent architectures, with theoretical variance reduction and dynamic regret bounds (Deng et al., 2023, Chen et al., 2024).

5. Extensions: Inexactness, Adaptivity, and Accelerated Methods

Inexact and Stochastic RGD

In practical high-dimensional or derivative-free scenarios, only approximate or stochastic gradients are available. The IRGD algorithm replaces the exact gradient with $\langle \operatorname{grad} f(x), v \rangle_x = Df(x)[v],\quad \forall v \in T_x \mathcal{M}.$ 6 satisfying either absolute or relative inexactness: $\langle \operatorname{grad} f(x), v \rangle_x = Df(x)[v],\quad \forall v \in T_x \mathcal{M}.$ 7 with appropriate step-size control ensuring convergence to stationary points or, under KL, to a limit point (Zhou et al., 2024). RSAM and Riemannian extragradient methods are encompassed as special cases.

Adaptive Step-Size

Adaptive RGD modulates the step-size $\langle \operatorname{grad} f(x), v \rangle_x = Df(x)[v],\quad \forall v \in T_x \mathcal{M}.$ 8 based on local curvature and observed gradient variation, achieving competitive or superior per-iteration progress relative to Armijo backtracking, particularly on nonnegatively curved manifolds (Ansari-Önnestam et al., 23 Apr 2025).

Acceleration and Line-Search

Manifold analogues of Nesterov acceleration (e.g., NARG with orthographic retraction) achieve optimal local linear rates given knowledge of spectrum or via adaptive-restart schemes, matching or outperforming Euclidean first-order methods on low-rank matrix problems (Li et al., 2022).

6. Consensus, Minimax Problems, and Product Manifold Optimization

Alternating projected/Riemannian strategies are natural for nonconvex—concave minimax or saddle-point problems on product manifolds. Algorithms such as ARPGDA combine Riemannian descent in manifold variables (e.g., Stiefel) with Euclidean projected ascent in convex variables (e.g., simplex), achieving $\langle \operatorname{grad} f(x), v \rangle_x = Df(x)[v],\quad \forall v \in T_x \mathcal{M}.$ 9 rates to stationarity, with rigorous potential-based convergence analysis (Xu et al., 2022). For geodesic nonconvex—strongly-concave minmax problems, deterministic and stochastic Riemannian GD–ascent schemes reach $\mathcal{R}_x$ 0 and $\mathcal{R}_x$ 1 stationarity sample complexity, improved by STORM-style variance reduction (Huang et al., 2020).

7. Illustrative Examples

SPD matrices: On the manifold $\mathcal{R}_x$ 2 of Hermitian positive-definite matrices, the affine-invariant RGD follows the geodesic-exponential update, with the Riemannian gradient coinciding with the matrix-congruence gradient and per-iteration cost $\mathcal{R}_x$ 3 (Duan et al., 2019).
Hyperbolic space: In the hyperboloid model of $\mathcal{R}_x$ 4, gradient descent is implemented by projection of the ambient Minkowski gradient, followed by an exact exponential-map update along geodesics, outperforming Poincaré-ball retraction methods (Wilson et al., 2018).

References

Selected papers providing key technical details and convergence results:

"Generalization to the Natural Gradient Descent" (Dong et al., 2022)
"Application of gradient descent algorithms based on geodesic distances" (Duan et al., 2019)
"Decentralized projected Riemannian gradient method for smooth optimization on compact submanifolds" (Deng et al., 2023)
"Inexact Riemannian Gradient Descent Method for Nonconvex Optimization" (Zhou et al., 2024)
"Fast gradient method for Low-Rank Matrix Estimation" (Li et al., 2022)
"Decentralized Online Riemannian Optimization with Dynamic Environments" (Chen et al., 2024)
"Riemannian gradient descent for Hartree-Fock theory" (Dinvay, 16 Mar 2026)
"Adaptive Gradient Descent on Riemannian Manifolds with Nonnegative Curvature" (Ansari-Önnestam et al., 23 Apr 2025)
"Projected Sobolev Natural Gradient Descent for Neural Variational Monte Carlo Solution of the Gross-Pitaevskii Equation" (Bao et al., 12 Dec 2025)
"An Efficient Alternating Riemannian/Projected Gradient Descent Ascent Algorithm for Fair Principal Component Analysis" (Xu et al., 2022)
"IRKA is a Riemannian Gradient Descent Method" (Mlinarić et al., 2023)
"Riemannian Stein Variational Gradient Descent for Bayesian Inference" (Liu et al., 2017)
"Riemannian Gradient Descent for Low-Rank Architectures" (Knight, 1 Jun 2026)
"Riemannian gradient descent-based quantum algorithms for ground state preparation with guarantees" (Pervez et al., 15 Dec 2025)
"Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point" (Martínez-Rubio et al., 2024)
"Gradient Descent Ascent for Minimax Problems on Riemannian Manifolds" (Huang et al., 2020)
"Fast Quantum Process Tomography via Riemannian Gradient Descent" (Volya et al., 2024)