Optimization on Manifolds

Updated 27 November 2025

Optimization on manifolds is the study of numerical methods that solve smooth constrained problems on geometric spaces using retractions, vector transports, and Riemannian gradients.
Techniques such as Riemannian gradient descent, conjugate gradient, trust-region, and stochastic methods are adapted to respect the intrinsic non-Euclidean geometry of the manifold.
This framework is vital for applications in data science, physics, signal processing, and topology optimization, enabling robust solutions for complex constraint problems.

Optimization on manifolds is the study and development of numerical algorithms for solving smooth (and, increasingly, nonsmooth or black-box) optimization problems constrained to smooth manifolds. These geometric domains arise naturally in matrix analysis, data science, signal processing, machine learning, physics, topology optimization, and statistical estimation. The core challenge is to generalize classical unconstrained optimization techniques—gradient descent, Newton's method, quasi-Newton, trust-region, and stochastic optimization—while respecting the intrinsic nonlinear geometry imposed by manifold constraints.

1. Geometric Framework and Problem Formulation

Let $(\mathcal{M},g)$ denote a $d$ -dimensional smooth (Riemannian or semi-Riemannian) manifold, with tangent space $T_x\mathcal{M}$ at $x\in\mathcal{M}$ , equipped with a smoothly varying metric $g_x(\cdot,\cdot)$ . The canonical problem is

$\min_{x\in\mathcal{M}} f(x),$

where $f:\mathcal{M}\to\mathbb{R}$ is typically assumed $C^1$ or $C^2$ . Manifold constraints encode nontrivial structure: spheres ( $\|x\|=1$ ), Stiefel ( $X^TX=I_p$ ), Grassmann (subspaces), SPD matrices ( $X\succ 0$ ), products, and more complex classes such as learned data manifolds or topological surfaces.

Key geometric tools include:

Retraction $R_x:T_x\mathcal{M}\to\mathcal{M}$ : a smooth map with $R_x(0_x)=x$ and $DR_x(0_x)=\mathrm{id}$ , serving as a computationally tractable surrogate for the exponential map.
Vector transport $\mathcal{T}_{x\rightarrow y}:T_x\mathcal{M}\to T_y\mathcal{M}$ : linear maps for moving gradients or search directions between tangent spaces.
Riemannian gradient $\mathrm{grad}\,f(x)\in T_x\mathcal{M}$ : uniquely characterized by $g_x(\mathrm{grad}\,f(x),\eta)=Df(x)[\eta]$ for all $\eta\in T_x\mathcal{M}$ . If $f$ is defined in an ambient Euclidean space, this is typically realized by orthogonal projection of the Euclidean gradient onto $T_x\mathcal{M}$ .
Riemannian Hessian $\mathrm{Hess}f(x)[\xi]=\nabla_\xi (\mathrm{grad} f)$ : the covariant derivative of the gradient field.

This geometric structure enables the translation of unconstrained optimization motifs to manifold settings (Hu et al., 2019).

2. Core Algorithmic Paradigms

2.1. First- and Second-Order Methods

Riemannian Gradient Descent (RGD): At iteration $k$ , with iterate $x_k$ ,

$\eta_k = -\mathrm{grad}\,f(x_k),\qquad x_{k+1}=R_{x_k}(\alpha_k\,\eta_k).$

Step-size $\alpha_k$ is chosen via Armijo, fixed, or adaptive schemes. Convergence to stationary points is guaranteed under standard regularity (Hu et al., 2019, Boumal et al., 2013).

Riemannian Conjugate Gradient (RCG): Builds conjugate directions via

$\zeta_k = -\mathrm{grad} f(x_k) + \beta_k\,\mathcal{T}_{x_{k-1}\rightarrow x_k}(\zeta_{k-1}),$

with $\beta_k$ chosen by Polak–Ribière or Fletcher–Reeves formulae.

Riemannian Trust-Region (RTR): Forms a local quadratic model on $T_{x_k}\mathcal{M}$ ,

$m_k(\xi) = f(x_k) + g_{x_k}(\mathrm{grad} f(x_k),\xi) + \tfrac{1}{2}\,g_{x_k}(\mathrm{Hess} f(x_k)[\xi],\xi),\quad \|\xi\| \leq \Delta_k,$

and solves the subproblem approximately (e.g., by truncated CG), accepting or rejecting the step by a ratio test (Hu et al., 2019).

Retraction-based Proximal and Accelerated Methods: By regularizing $f$ with a squared retraction distance,

$h_\kappa(y; x_{k-1}) = f(y) + \frac{\kappa}{2} d_R^2(y, x_{k-1}),$

where $d_R$ is the retraction-based distance. Adaptive convexification (doubling $\kappa$ ) ensures convexity of each subproblem, yielding $O(1/k^2)$ accelerated rates for strongly convex objectives (Lin et al., 2020).

2.2. Quasi-Newton and Stochastic Methods

Riemannian L-BFGS and Variants: Two-loop L-BFGS schemes generalized to manifolds demand computation of correction pairs between iterates and gradients mapped to the same tangent space, typically via vector transport. On SPD manifolds, vector-transport-free schemes that render transport the identity and reduce the metric to Frobenius inner-product (via inverse-square-root or Cholesky mappings) significantly simplify and accelerate computation (Godaz et al., 2021).
Stochastic & Minibatch Optimization: For $f(x)=\frac{1}{N}\sum_{i=1}^N f_i(x)$ , stochastic variance reduction (SVRG), stochastic L-BFGS, and minibatching protocols are adapted using Riemannian operations. Retraction-based updates and vector transport maintain coherence across tangent spaces. Convergence guarantees (e.g., linear for strongly convex $f$ ) and practical acceleration (e.g., Karcher means, leading eigenvalue) have been demonstrated (Roychowdhury, 2017).
Population-Based Optimization: Black-box, derivative-free population strategies (e.g., Extended Riemannian SDFO) operate intrinsically on $M$ by constructing and evolving mixtures of probability measures on $M$ , deploying information-geometric updates to the weights/parameters. Such methods have been shown to globally optimize on highly nontrivial manifolds (e.g., Jacob's ladder) where no global coordinate chart or algebraic constraints are available (Fong et al., 2019).

2.3. Composite and Constrained Methods

SQP on Manifolds: Sequential quadratic programming admits full generalization. Via retractions and local stratifications/pullbacks, quadratic subproblems are solved on $T_xM$ (the tangent space), then retracted back. Globalization is achieved by composite damped steps and cubic regularization. Local quadratic convergence is established under classical KKT and second-order conditions, with explicit control of linearization and curvature terms (Schiela et al., 2020).

3. Advanced Geometric Models and Acceleration

3.1. Symplectic and Hamiltonian Approaches

Accelerated and robust convergence on manifolds have been realized by geometric integration of conformal Hamiltonian flows. Introducing kinetic momentum and dissipation (on the cotangent bundle $T^*M$ ) yields a continuous-time system: $\dot{q} = \partial H/\partial p, \qquad \dot{p} = -\partial H/\partial q - \gamma p,$ with $H(q,p) = \frac{1}{2} p^T M^{-1} p + f(q)$ . Discretizations, such as dissipative RATTLE or conformal-symplectic leapfrog, provably inherit accelerated convergence rates, outperforming Riemannian gradient descent both in iteration count and tolerated step size for a range of problems: sphere, SO(n), Procrustes (França et al., 2021, Ghirardelli, 2023).

3.2. Iso-Riemannian Geometry for Learned Manifolds

In data-driven manifold settings ( $\mathcal{M}$ realized as the image of a learned diffeomorphism $\varphi:\mathbb{R}^n\to\mathbb{R}^n$ ), classical Levi-Civita geodesics may have undesirable variable-speed parameterizations, breaking geodesic convexity and identification of barycentres. Iso-Riemannian geometry replaces the connection, yielding constant-speed geodesics and iso-convexity. New definitions of monotonicity and Lipschitz continuity (iso-monotonicity and iso-Lipschitzness) ensure linear convergence of iso-Riemannian descent for clustering, inverse problems, and optimization on learned manifolds, with formal convergence theorems and applications to clustering and inverse problems on real data (Diepeveen et al., 23 Oct 2025).

3.3. Optimization on Unknown or Point Cloud Manifolds

For black-box or expensive objective functions where the manifold is specified only by a point cloud, efficient Bayesian optimization methods have been developed:

Graph Gaussian Process UCB (GGP-UCB): Constructs a weighted graph over the sample cloud, defines graph Laplacians approximating the Laplace–Beltrami operator, and builds Matérn or SE covariance surrogates. Queries are selected via an acquisition function respecting intrinsic geometry. Fast convergence with regret bounds is proven under spectral approximation conditions (Kim et al., 2022).
Extrinsic Bayesian Optimization (eBO): Embeds $M$ into a higher-dimensional Euclidean space via equivariant embeddings, applies classical Euclidean GP surrogates, and optimizes acquisition functions on the embedded submanifold. This approach allows for practical BO across spheres, Grassmannians, and SPD manifolds, with empirical superior sample efficiency versus Riemannian gradient descent or simplex methods (Fang et al., 2022).
Atlas-Graph Proxies: Learns overlapping coordinate charts (atlases) and transition maps from point clouds, enabling O(nk) computation of exp, log, parallel transport, and retraction. Yields order-of-magnitude speedups for optimization on e.g., the Grassmann and image-manifold settings, and supports manifold SVMs for classification when geometric primitives are not available in closed form (Robinett et al., 22 Jan 2025).

4. Parallel, Distributed, and Large-Scale Methods

In massive-data settings, communication-efficient distributed Riemannian optimization exploits surrogate objectives that combine local curvature information with global gradient averages, requiring only first-order inter-node communication. Linear or sublinear convergence is established by leveraging retraction-based geometry and careful surrogate construction (Saparbayeva et al., 2018). Accelerated, communication-efficient methods are applied at large-scale in matrix completion (Grassmann) and Fréchet mean estimation (sphere) scenarios.

5. Extensions: Semi-Riemannian, Non-smooth, and Topology Optimization

Semi-Riemannian Optimization: Extends the machinery of Riemannian descent, conjugate gradient, and trust-region to manifolds with indefinite metric tensors. Descent is maintained by locally inducing a positive-definite auxiliary inner product, and stationarity and convergence proofs are shown to be metric-independent (Gao et al., 2018).
Manifold Topology Optimization: For PDE-governed shape or topology optimization on curved surfaces (e.g. spheres, tori, Möbius bands, Klein bottles), the design variable is a field on the surface, filtered/projection-regularized to ensure well-posedness. Finite element solvers for surface PDEs are coupled to gradient-based optimizers, with sensitivity analysis and adjoint equations respecting the geometry and topology of the manifold (Deng et al., 2019).
Doubly-Stochastic and Probability Simplex Manifolds: Specialized Riemannian geometry for optimization over the (open) Birkhoff polytope or multinomial manifolds provides explicit geodesics, projection operators, and Fisher-metric Hessians, which render trust-region and quasi-Newton methods both globally convergent and orders-of-magnitude faster than classical interior-point solvers in high dimensions (Douik et al., 2018).

6. Practical Toolboxes and Computational Infrastructure

Several open-source toolboxes implement the above theory and algorithms:

Manopt (MATLAB): Modular architecture for a wide class of manifolds and algorithms; efficient retractions and vector transports (Boumal et al., 2013).
Pymanopt (Python): Automatic differentiation via autograd/Theano/TensorFlow; supports sphere, Stiefel, Grassmann, SPD, and fixed-rank matrix manifolds, Riemannian gradient/conjugate-gradient/trust-region (Townsend et al., 2016).
Additional toolboxes for high-performance parallel manifold optimization (MPI-based), stochastic/accelerated methods, and black-box settings further extend applicability.

7. Theoretical Guarantees and Complexity

The convergence theory of manifold optimization closely parallels classical results, with additional features due to curvature and topology:

Asymptotic convergence: For RGD, every accumulation point is stationary; convergence rates $O(1/\varepsilon^2)$ for $\|\mathrm{grad}\,f(x)\|\leq\varepsilon$ .
Accelerated rates: $O(1/k^2)$ are achievable via Nesterov-type extrapolation and regularization, with mild convexity or strong convexity.
Trust-region methods: $O(\varepsilon^{-1.5})$ iterations to $\varepsilon$ -second-order stationary points.
Symplectic and conformal-Hamiltonian methods: Provably inherit accelerated ODE rates, robust to ill-conditioning.
Quasi-Newton and stochastic methods: Linear convergence under strong convexity, with global guarantees for population-based and black-box algorithms in particular settings.