Papers
Topics
Authors
Recent
Search
2000 character limit reached

Riemannian/Projected Gradient Descent

Updated 17 June 2026
  • Riemannian/Projected Gradient Descent is an optimization framework that extends classical steepest descent to curved spaces by leveraging Riemannian metrics and geometric projections.
  • It uses exponential maps or efficient retractions for Riemannian updates and nearest-point projections for constrained sets, maintaining feasibility throughout the iterations.
  • The methods achieve provable convergence properties under curvature and convexity assumptions, with applications ranging from quantum information to low-rank matrix optimization.

Riemannian and projected gradient descent (GD) are fundamental algorithmic paradigms for optimization over manifolds, submanifolds, and nonlinear constraint sets. Their core principle is to generalize classical steepest-descent to spaces with nontrivial geometry by replacing Euclidean operations with their Riemannian or geometric analogues, ensuring that the optimization trajectory remains on the feasible set and that the update directions account for curvature and constraint-induced structure.

1. Geometric Foundations and Algorithmic Variants

Riemannian gradient descent (RGD) operates on smooth manifolds M\mathcal{M} endowed with a Riemannian metric ,x\langle\cdot,\cdot\rangle_x on each tangent space TxMT_x \mathcal{M}. At each iteration, the algorithm computes the Riemannian gradient gradf(x)TxM\operatorname{grad} f(x) \in T_x \mathcal{M}, which satisfies

gradf(x),vx=Df(x)[v],vTxM.\langle \operatorname{grad} f(x), v \rangle_x = Df(x)[v],\quad \forall v \in T_x \mathcal{M}.

The basic geometric update is the exponential map (or an efficient retraction Rx\mathcal{R}_x): xk+1=Expx(ηkgradf(xk)),orxk+1=Rx(ηkgradf(xk)).x_{k+1} = \operatorname{Exp}_x(-\eta_k \operatorname{grad} f(x_k)), \quad\text{or}\quad x_{k+1} = \mathcal{R}_x(-\eta_k \operatorname{grad} f(x_k)). Projected gradient descent (PGD) and Projected Riemannian Gradient Descent (PRGD) generalize this scheme to embedded submanifolds or constraint sets, relying on orthogonal projections or nearest-point projections after a tangent-space step. If MRnM \subset \mathbb{R}^n is a compact C2C^2 submanifold, the projection is

ProjM(u)=argminyMyu,\operatorname{Proj}_M(u) = \arg\min_{y \in M} \|y - u\|,

and the projected update is

,x\langle\cdot,\cdot\rangle_x0

For submanifolds, the Riemannian gradient may be implemented as the metric projection of the Euclidean gradient onto the tangent space ,x\langle\cdot,\cdot\rangle_x1,

,x\langle\cdot,\cdot\rangle_x2

with subsequent manifold retraction or nearest-point projection.

Table: Core Variants

Method Feasible Set Descent Step Return to Manifold
RGD Riemannian ,x\langle\cdot,\cdot\rangle_x3 ,x\langle\cdot,\cdot\rangle_x4 Exponential / Retraction
PGD ,x\langle\cdot,\cdot\rangle_x5, ,x\langle\cdot,\cdot\rangle_x6 (convex set or submanifold) ,x\langle\cdot,\cdot\rangle_x7 Metric projection
PRGD Submanifold ,x\langle\cdot,\cdot\rangle_x8 ,x\langle\cdot,\cdot\rangle_x9 Manifold projection

2. Differential Structure, Projections, and Natural Metrics

A unifying principle is the replacement of Euclidean notions with those induced by the manifold's metric or a suitable pull-back metric, as in natural gradient descent (NGD). For parameterized constraints or ansatzes of the form TxMT_x \mathcal{M}0 with TxMT_x \mathcal{M}1, TxMT_x \mathcal{M}2, one can induce a Riemannian metric on TxMT_x \mathcal{M}3 by pull-back of a reference metric TxMT_x \mathcal{M}4 on TxMT_x \mathcal{M}5: TxMT_x \mathcal{M}6 The generalized NGD update is then

TxMT_x \mathcal{M}7

In constrained scenarios, projection onto the admissible tangent space arises either via explicit orthogonal projection or implicitly via the induced metric and the Jacobian of the embedding TxMT_x \mathcal{M}8. If the full ambient tangent space is not spanned, the projected update in TxMT_x \mathcal{M}9 takes the form

gradf(x)TxM\operatorname{grad} f(x) \in T_x \mathcal{M}0

with gradf(x)TxM\operatorname{grad} f(x) \in T_x \mathcal{M}1 denoting the orthogonal projection onto the image subspace, ensuring that the iterate remains on the submanifold gradf(x)TxM\operatorname{grad} f(x) \in T_x \mathcal{M}2 (Dong et al., 2022).

3. Convergence Properties and Algorithmic Guarantees

Local Linear and Sublinear Rates

Under standard Riemannian smoothness and convexity assumptions (e.g., geodesic convexity, sectional curvature bounds), RGD and its projected variants enjoy well-characterized convergence rates:

  • For gradf(x)TxM\operatorname{grad} f(x) \in T_x \mathcal{M}3-smooth, gradf(x)TxM\operatorname{grad} f(x) \in T_x \mathcal{M}4-strongly geodesically convex functions, RGD with constant step-size gradf(x)TxM\operatorname{grad} f(x) \in T_x \mathcal{M}5 achieves linear convergence:

gradf(x)TxM\operatorname{grad} f(x) \in T_x \mathcal{M}6

where gradf(x)TxM\operatorname{grad} f(x) \in T_x \mathcal{M}7 and the rates incorporate curvature-dependent constants such as gradf(x)TxM\operatorname{grad} f(x) \in T_x \mathcal{M}8 (Martínez-Rubio et al., 2024).

  • In the absence of strong convexity, sublinear gradf(x)TxM\operatorname{grad} f(x) \in T_x \mathcal{M}9 rates in function value gap are attainable.
  • In nonconvex settings, under gradf(x),vx=Df(x)[v],vTxM.\langle \operatorname{grad} f(x), v \rangle_x = Df(x)[v],\quad \forall v \in T_x \mathcal{M}.0-smoothness and the Kurdyka–Łojasiewicz property, global convergence of RGD (and IRGD) can be established, with convergence rates depending on the local KL exponent (Zhou et al., 2024).

Projected Riemannian GD and decentralized algorithms on compact submanifolds attain comparable rates, with gradf(x),vx=Df(x)[v],vTxM.\langle \operatorname{grad} f(x), v \rangle_x = Df(x)[v],\quad \forall v \in T_x \mathcal{M}.1 and gradf(x),vx=Df(x)[v],vTxM.\langle \operatorname{grad} f(x), v \rangle_x = Df(x)[v],\quad \forall v \in T_x \mathcal{M}.2 rates to stationarity for standard and gradient-tracking schemes, respectively (Deng et al., 2023). On Hadamard manifolds with nonpositive curvature, decentralized projected RGD achieves minimax-optimal dynamic regret of order gradf(x),vx=Df(x)[v],vTxM.\langle \operatorname{grad} f(x), v \rangle_x = Df(x)[v],\quad \forall v \in T_x \mathcal{M}.3, where gradf(x),vx=Df(x)[v],vTxM.\langle \operatorname{grad} f(x), v \rangle_x = Df(x)[v],\quad \forall v \in T_x \mathcal{M}.4 is path variation and gradf(x),vx=Df(x)[v],vTxM.\langle \operatorname{grad} f(x), v \rangle_x = Df(x)[v],\quad \forall v \in T_x \mathcal{M}.5 the spectral gap of the consensus matrix (Chen et al., 2024).

Trade-offs and Implementation

The explicit choice of metric (e.g., Fisher, Sobolev, pull-back via function-space structure) directly impacts both convergence rate and numerical stability, often dramatically accelerating optimization in ill-conditioned, constraint-rich, or physically structured models (Dong et al., 2022, Bao et al., 12 Dec 2025). Step-size schedules can be constant, diminishing, or adaptive, and line searches or Armijo-type rules are frequently employed to guarantee monotonic descent or enforce stability constraints (Mlinarić et al., 2023).

4. Practical Implementations and Applications

Riemannian and projected GD frameworks have wide-ranging applications:

5. Extensions: Inexactness, Adaptivity, and Accelerated Methods

Inexact and Stochastic RGD

In practical high-dimensional or derivative-free scenarios, only approximate or stochastic gradients are available. The IRGD algorithm replaces the exact gradient with gradf(x),vx=Df(x)[v],vTxM.\langle \operatorname{grad} f(x), v \rangle_x = Df(x)[v],\quad \forall v \in T_x \mathcal{M}.6 satisfying either absolute or relative inexactness: gradf(x),vx=Df(x)[v],vTxM.\langle \operatorname{grad} f(x), v \rangle_x = Df(x)[v],\quad \forall v \in T_x \mathcal{M}.7 with appropriate step-size control ensuring convergence to stationary points or, under KL, to a limit point (Zhou et al., 2024). RSAM and Riemannian extragradient methods are encompassed as special cases.

Adaptive Step-Size

Adaptive RGD modulates the step-size gradf(x),vx=Df(x)[v],vTxM.\langle \operatorname{grad} f(x), v \rangle_x = Df(x)[v],\quad \forall v \in T_x \mathcal{M}.8 based on local curvature and observed gradient variation, achieving competitive or superior per-iteration progress relative to Armijo backtracking, particularly on nonnegatively curved manifolds (Ansari-Önnestam et al., 23 Apr 2025).

Manifold analogues of Nesterov acceleration (e.g., NARG with orthographic retraction) achieve optimal local linear rates given knowledge of spectrum or via adaptive-restart schemes, matching or outperforming Euclidean first-order methods on low-rank matrix problems (Li et al., 2022).

6. Consensus, Minimax Problems, and Product Manifold Optimization

Alternating projected/Riemannian strategies are natural for nonconvex—concave minimax or saddle-point problems on product manifolds. Algorithms such as ARPGDA combine Riemannian descent in manifold variables (e.g., Stiefel) with Euclidean projected ascent in convex variables (e.g., simplex), achieving gradf(x),vx=Df(x)[v],vTxM.\langle \operatorname{grad} f(x), v \rangle_x = Df(x)[v],\quad \forall v \in T_x \mathcal{M}.9 rates to stationarity, with rigorous potential-based convergence analysis (Xu et al., 2022). For geodesic nonconvex—strongly-concave minmax problems, deterministic and stochastic Riemannian GD–ascent schemes reach Rx\mathcal{R}_x0 and Rx\mathcal{R}_x1 stationarity sample complexity, improved by STORM-style variance reduction (Huang et al., 2020).

7. Illustrative Examples

  • SPD matrices: On the manifold Rx\mathcal{R}_x2 of Hermitian positive-definite matrices, the affine-invariant RGD follows the geodesic-exponential update, with the Riemannian gradient coinciding with the matrix-congruence gradient and per-iteration cost Rx\mathcal{R}_x3 (Duan et al., 2019).
  • Hyperbolic space: In the hyperboloid model of Rx\mathcal{R}_x4, gradient descent is implemented by projection of the ambient Minkowski gradient, followed by an exact exponential-map update along geodesics, outperforming Poincaré-ball retraction methods (Wilson et al., 2018).

References

Selected papers providing key technical details and convergence results:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Riemannian/Projected Gradient Descent.