Papers
Topics
Authors
Recent
2000 character limit reached

Riemannian Optimization Implementation

Updated 12 January 2026
  • Riemannian Optimization is a method that generalizes classical unconstrained optimization by incorporating the geometry of smooth manifolds through metrics, gradient projection, and retraction.
  • It leverages tangent space projections, vector transport, and adaptive step-size rules to efficiently handle constraints inherent in manifold settings.
  • Implementations span gradient descent, conjugate gradient, and variance-reduced methods, yielding practical performance gains in large-scale and matrix-manifold problems.

Riemannian optimization implementation generalizes unconstrained optimization algorithms to settings where constraint sets are smooth manifolds, leveraging the geometry of the feasible set via a chosen Riemannian metric and associated geometric tools. Techniques span a spectrum from basic gradient-based methods to stochastic, adaptive, and projection-free algorithms, with substantial algorithmic and implementation diversity emerging from choices of metric, retraction, tangent space projection, and computational architecture.

1. Fundamental Principles of Riemannian Optimization

Riemannian optimization considers problems of the form minxMf(x)\min_{x \in \mathcal{M}} f(x), where M\mathcal{M} is a smooth embedded manifold in Rn\mathbb{R}^n or Cn\mathbb{C}^n, and ff is a differentiable cost function. Equipped with a Riemannian metric gxg_x, each tangent space TxMT_x\mathcal{M} inherits an inner product ,x\langle \cdot, \cdot\rangle_x, which fundamentally shapes gradient and Hessian computations (Smith, 2014).

Core computational steps include:

  • Riemannian gradient projection: The Riemannian gradient gradf(x)TxM\operatorname{grad} f(x) \in T_x\mathcal{M} satisfies gradf(x),ξx=Df(x)[ξ]\langle \operatorname{grad} f(x), \xi\rangle_x = Df(x)[\xi] for all ξTxM\xi \in T_x\mathcal{M}.
  • Retraction: A mapping Rx:TxMMR_x : T_x\mathcal{M} \to \mathcal{M} approximates the exponential map for small steps, ensuring iterates remain on M\mathcal{M}.
  • Vector transport: Allows consistent transfer of directions between tangent spaces, critical for conjugate gradient, quasi-Newton, and stochastic variance reduction methods.

These ingredients enable direct extension of classical optimization schemes—gradient descent, conjugate gradient, Newton-type methods, Frank–Wolfe, and stochastic algorithms—with minor modifications at each geometric step.

2. Metrics, Projections, and Retractions in Practice

The choice of Riemannian metric dramatically affects both theory and computational cost. On matrix manifolds, metrics are often induced by symmetric positive-definite (SPD) weights or tailored through preconditioning (Tiep et al., 19 Sep 2025, Shustin et al., 2019).

Indefinite Stiefel Manifold Example (Tiep et al., 19 Sep 2025):

  • Metric: gX(Z1,Z2)=tr(Z1TM(X)Z2)g_X(Z_1, Z_2) = \operatorname{tr}(Z_1^T M(X) Z_2), with M(X)M(X) chosen to avoid expensive Lyapunov solves.
  • Tangent space: TXiSt={Z:ZTAX+XTAZ=0}T_X\,\mathrm{iSt} = \{Z: Z^TAX + X^TAZ = 0\}.
  • Orthogonal projection: Explicit formula avoids matrix equations: PX(Y)=YXJsym(XTAY)P_X(Y) = Y - X J \operatorname{sym}(X^T A Y).
  • Riemannian gradient: Direct, closed form:

gradf(X)=PX(M(X)f(X)).\operatorname{grad} f(X) = P_X(M(X) \nabla f(X)).

Retractions: Vary by manifold:

  • Matrix manifolds: QR or polar decompositions.
  • Indefinite Stiefel: Quasi-geodesic based retraction leveraging matrix exponential structures.
  • Fixed-rank matrices: Projector-splitting or truncated SVD (Naram et al., 2021, Fonarev et al., 2017).

Practically, such tailored projections and retractions yield significant speedups, especially for large kk in Stiefel-type manifolds, by circumventing cubic-cost matrix equations repeatedly encountered in canonical metric settings (Tiep et al., 19 Sep 2025).

3. Algorithmic Archetypes and Computational Complexity

Riemannian optimization implements gradient, Newton-type, Frank–Wolfe, variance-reduced, and adaptive schemes. The key distinctions from Euclidean algorithms arise in how manifold geometry shapes step computation (Smith, 2014, Weber et al., 2017).

Gradient Descent:

  • Iteration: Xj+1=RXj(τjgradf(Xj))X_{j+1} = R_{X_j}(-\tau_j \operatorname{grad} f(X_j)).
  • Step-size selection: Barzilai–Borwein or Armijo, using manifold-adapted inner products.

Conjugate Gradient:

  • Update: Pk=gradf(Wk)+βkTWk1Wk(Pk1)P_k = -\operatorname{grad} f(W_k) + \beta_k \mathcal{T}_{W_{k-1} \to W_k}(P_{k-1}).
  • Line search and retraction as above, with vector transport via projection onto the new tangent space (Naram et al., 2021).

Frank–Wolfe on Manifolds:

  • Linear oracle: minimize linearized cost over feasible manifold constraints, explicit on SPD and SO(n) (Weber et al., 2017).
  • Geodesic update: xk+1=Expxk(γkLogxk(zk))x_{k+1} = \operatorname{Exp}_{x_k}(\gamma_k \operatorname{Log}_{x_k}(z_k)).
  • Complexity: Each iteration O(n3)O(n^3) on SPD.

Variance-Reduced/Adaptive Schemes:

Sample Complexity and Rates:

  • First-order Riemannian methods: O(1/ϵ2)O(1/\epsilon^2) to reach ϵ\epsilon-stationarity; stochastic variants O(1/ϵ4)O(1/\epsilon^4) (He et al., 5 Aug 2025).
  • Frank–Wolfe and SVRG: Sublinear, linear under convexity, matching Euclidean analogs but scaled by curvature and metric choice.

4. Manifold Learning, Iso-Riemannian Geometry, and Sample-Based Approaches

Extending Riemannian optimization to learned manifolds, non-smooth or implicit settings necessitates new geometric machinery (Diepeveen et al., 23 Oct 2025, Shustin et al., 2022).

Iso-Riemannian Geometry:

  • For data manifolds learned via normalizing flows, the standard Levi-Civita connection yields distorting geodesics.
  • Iso-connection enforces constant-speed paths:

VisoU=VRn1V(VU).\nabla^{\mathrm{iso}}_V U = \|V\|^{-1}_{\mathbb{R}^n} \nabla_V(\|V\| U).

  • Optimization and convexity properties are redefined in terms of "iso-monotonicity" and "iso-Lipschitzness," yielding new descent algorithms with provable linear rates (Diepeveen et al., 23 Oct 2025).

Manifold-Free Riemannian Optimization:

  • For cases with only sample sets, tangent spaces, projections, and gradients are constructed locally via polynomial regression (MMLS), yielding approximate geometry and provably convergent gradient and conjugate-gradient implementations (Shustin et al., 2022).
  • Complexity and convergence closely track fill-distance and polynomial degree in the sample set.

5. Implementation Frameworks and Software Infrastructures

The evolution of Riemannian optimization has led to robust open-source libraries supporting a spectrum of algorithms:

  • Geoopt (PyTorch): Manifold-aware parameter classes, metrics, retractions, and optimizers (SGD, Adam, RMSProp, etc.), supporting popular models (sphere, Stiefel, SPD, hyperbolic geometries) (Kochurov et al., 2020).
  • Rieoptax (JAX): ManifoldArray structure, composable optimizers (SGD, SVRG, AdaGrad, Adam, SARAH), differential privacy mechanisms, extensive benchmarking against other packages (Utpala et al., 2022).
  • Manopt, Pymanopt: MATLAB and Python tools with high-level automated metric/retraction/gradient logic.

These frameworks feature automatic gradient projection, batch-compatible manifold operations, transparent vector transport, and capitalize on automatic differentiation and hardware acceleration.

6. Advanced Topics: Bregman and Accelerated Variational Integrators

Recent methodology has introduced advanced geometric and variational approaches:

  • Riemannian Bregman Gradient Methods: Update directions and step sizes incorporate Bregman distances derived from quartic or custom reference functions, permitting closed-form solutions and straightforward linesearches on spheres or Stiefel-type manifolds (He et al., 5 Aug 2025).
  • Accelerated Optimization via Variational Integrators: Time-adaptive symplectic integrators discretize the Bregman/Hamiltonian flows, maintaining geometric and energy invariants, enhancing stability and robustness especially on constrained optimization problems (Duruisseaux et al., 2021).

7. Applications and Performance Metrics

Riemannian optimization underpins scientific computing tasks including electronic structure determination (Hartree–Fock), matrix factorization, low-rank regression, extreme classification, clustering on learned manifolds, and word embedding models (Silva, 2024, Naram et al., 2021, Fonarev et al., 2017, Jiang et al., 2017, Diepeveen et al., 23 Oct 2025).

Performance Table: Riemannian Gradient Descent on Indefinite Stiefel

Component Cost per Iteration Notable Feature
Euclidean gradient O(n2k)O(n^2 k) Standard for general ff
Riemannian gradient (new) O(n2k+nk2)O(n^2 k + n k^2) Avoids Lyapunov matrix equation
Quasi-geodesic retraction O(k3)O(k^3) Cheap when knk \ll n

For large-scale, high-dimensional problems, careful selection of metric, algorithmic variant, and computational architecture are necessary to avoid bottlenecks.

8. Best Practices and Numerical Guidelines

  • Align metric choice with underlying data geometry to minimize condition number and iteration count (Shustin et al., 2019).
  • Maintain feasibility via retraction rather than explicit projection.
  • For large kk, exploit metrics/retractions avoiding expensive matrix equations.
  • Use line search and variance-reduction techniques to control step size and ensure rapid convergence (Tiep et al., 19 Sep 2025, Jiang et al., 2017, He et al., 5 Aug 2025).
  • Monitor stationarity and iterate accuracy via norm of the Riemannian gradient.

References to the above methodologies, algorithms, and implementation recipes can be found in (Tiep et al., 19 Sep 2025, Diepeveen et al., 23 Oct 2025, Weber et al., 2017, Jiang et al., 2017, Smith, 2014, Naram et al., 2021, Shustin et al., 2019, Utpala et al., 2022, Kochurov et al., 2020, Shustin et al., 2022, He et al., 5 Aug 2025, Zhang et al., 2016, Novikov et al., 2021, Silva, 2024, Roychowdhury, 2017, Fonarev et al., 2017, Duruisseaux et al., 2021).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Riemannian Optimization Implementation.