Riemannian Constrained Optimization

Updated 6 May 2026

Riemannian Constrained Optimization is a framework that formulates constrained problems on smooth manifolds using geometric primitives like tangent spaces, metrics, retractions, and vector transports.
It integrates first- and second-order methods, including Riemannian gradient descent, Newton, and trust-region approaches, with adaptive metrics to enhance convergence in complex settings.
Applications in machine learning, computational physics, and quantum information underscore its practical impact by enabling robust and scalable solutions to high-dimensional, constrained problems.

Riemannian Constrained Optimization (RCO) is a framework for solving constrained optimization problems in which the feasible set is a smooth manifold, or is equipped with a manifold structure reflecting both equality and, when possible, inequality constraints. By leveraging geometric properties of manifolds—such as tangent spaces, Riemannian metrics, retractions, and vector transports—RCO generalizes classical optimization techniques to domains arising in machine learning, computational physics, control, signal processing, quantum information, and PDE-constrained design. This approach encompasses both general-purpose algorithms (gradient descent, conjugate gradient, Newton/trust-region, interior point, penalty/smoothing) and highly specialized solvers for large-scale structured problems, embedding constraint handling directly into the optimization dynamics.

1. Mathematical Foundations and Geometric Primitives

The core RCO problem is

$\min_{x\in\mathcal{M}}\, f(x),$

with $\mathcal{M}\subset\mathbb{R}^n$ a $d$ -dimensional smooth manifold defined by constraints (often $c(x)=0$ for equality, $h(x)\leq0$ for inequalities) and $f$ smooth or sometimes nonsmooth. The essential geometric objects are:

Tangent space: $T_x \mathcal{M}$ at $x\in\mathcal{M}$ , characterizing feasible directions. For embedded submanifolds, $T_x\mathcal{M} = \ker J_c(x)$ when $c(x)=0$ defines the constraints and $\mathcal{M}\subset\mathbb{R}^n$ 0 is the Jacobian.
Riemannian metric: $\mathcal{M}\subset\mathbb{R}^n$ 1, a smoothly varying inner product on $\mathcal{M}\subset\mathbb{R}^n$ 2,} enabling the definition of gradients and Hessians. The metric may be inherited from the ambient space or defined via preconditioning (e.g., as in generalized Stiefel/Grassmann manifolds (Shustin et al., 2019, Mishra et al., 2014, Mor et al., 2020)).
Retraction: $\mathcal{M}\subset\mathbb{R}^n$ 3 maps a tangent vector back to the manifold, serving as an efficient surrogate for the exponential map.
Vector transport: $\mathcal{M}\subset\mathbb{R}^n$ 4 enables the movement of tangent vectors between different points for constructing higher-order methods.

The Riemannian gradient is the unique tangent vector solving $\mathcal{M}\subset\mathbb{R}^n$ 5 for all $\mathcal{M}\subset\mathbb{R}^n$ 6, typically formed by projecting the Euclidean gradient onto $\mathcal{M}\subset\mathbb{R}^n$ 7. The Riemannian Hessian involves the covariant derivative via the Levi-Civita connection and captures second-order geometry.

In the context of equality and inequality constraints, extensions handle constraints via barrier methods, augmented Lagrangian or exact penalty formulations, with retraction and projection steps ensuring feasibility (Liu et al., 2019, Lai et al., 2022, Xiao et al., 2022).

2. Algorithms and Methods for RCO

A wide spectrum of algorithms have been developed and analyzed:

First-order methods

Riemannian gradient descent (RGD): $\mathcal{M}\subset\mathbb{R}^n$ 8; commonly combined with line search or Armijo backtracking (Smith, 2014).
Conjugate gradient and momentum variants: exploit vector transport and manifold-adapted formulas (Polak–Ribière or Fletcher–Reeves) for nonlinear conjugacy (Smith, 2014, Naram et al., 2021), supporting rapid local convergence.

Second-order methods

Riemannian Newton and trust-region methods: solve the Newton equation $\mathcal{M}\subset\mathbb{R}^n$ 9 in $d$ 0, update by retraction, guaranteeing quadratic or superlinear convergence under nondegeneracy (Smith, 2014, Hu et al., 2017, Deng et al., 2023, Mor et al., 2020).
Adaptive regularized Newton/cubic-regularized Newton: regularizes the local model for global convergence or saddle-point escaping, with complexity $d$ 1 in the strongly Riemannian setting (Hu et al., 2017, Deng et al., 2023).
Trust-region subproblems are naturally formulated as manifold-constrained quadratic minimizations, solved efficiently via Riemannian gradients and preconditioned metrics (Mor et al., 2020).

Stochastic and projection-free methods

Stochastic Riemannian optimization incorporates unbiased stochastic gradients or subgradients and supports variance reduction (SVRG, SPIDER) (Weber et al., 2019, Aspman et al., 2023).
Frank-Wolfe algorithms avoid retraction/projection by using a tangent-space linear oracle; for geodesically convex subsets of manifolds, sublinear rates match the Euclidean theory (Weber et al., 2019).

Penalty/interior/exact methods

Augmented Lagrangian and interior point extensions for manifolds leverage Lagrange multiplier updates, penalty terms, and central-path following, providing global and local (quasi-Newton) convergence (Lai et al., 2022, Liu et al., 2019).
Smoothing and constraint-dissolving (CDF) methods transform RCO into unconstrained minimization of auxiliary functions with carefully designed properties for equivalence of stationary points and Hessians, allowing direct application of off-the-shelf unconstrained solvers (Xiao et al., 2022).

3. Riemannian Metrics, Preconditioning, and Acceleration

Selection of the Riemannian metric dramatically influences local convergence and robustness:

Preconditioning: The choice of metric, often derived from the Lagrangian or problem-specific curvature, acts as a Riemannian preconditioner, minimizng condition numbers of the Hessian and improving both global and local rates (Mor et al., 2020, Mishra et al., 2014, Shustin et al., 2019). Variable or adaptive metrics can encode spectral properties of the problem; e.g., in trust-region problems, $d$ 2 with $d$ 3 (Mor et al., 2020).
Acceleration: Variational and symplectic integrators enable acceleration mechanisms for RCO by discretizing Bregman/Hamiltonian flows, providing stability and faster asymptotic rates, including rate matching for accelerated mirror descent (Duruisseaux et al., 2021).

4. RCO in Large-Scale and Structured Problems

RCO has enabled advances in high-dimensional applications:

Randomized submanifold and factorization methods: Algorithms such as the Randomized Riemannian Submanifold (RRS) reduce per-iteration complexity by restricting updates to low-dimensional submanifolds, e.g., OLS problems or matrix-valued orthogonality constraints (Han et al., 18 May 2025). Factorization-based manifold representations (Stiefel/Grassmann, fixed-rank, PSD cones) are foundational in low-rank approximation, subspace tracking, and matrix completion (Shustin et al., 2019, Naram et al., 2021, Esposito et al., 31 Mar 2025).
Multiplicative updates: For problems enforcing simplex/nonnnegativity constraints, multiplicative Riemannian updates on oblique/rotation manifolds enforce constraints implicitly and achieve efficient convergence, avoiding costly projections (Esposito et al., 31 Mar 2025).
Constraint manifold learning and "manifold-free" methods: When only samples from the manifold or black-box cost evaluations are available, approximation schemes based on Manifold-MLS (moving least squares) produce effective surrogates for tangent spaces, projectors, and retractions with provable convergence (Shustin et al., 2022).
Infinite-dimensional and PDE-constrained settings: In shape optimization, manifolds of diffeomorphisms with outer Sobolev-type metrics ( $d$ 4) regularize the deformation space, improving both convergence and discretization quality compared to inner or boundary-based metrics (Loayza-Romero et al., 28 Mar 2025).
Budget and discrete constraints: Recent work casts resource-bounded optimization as RCO on the softmax-expected-cost budget manifold, enabling exact constraint satisfaction with minimal overhead via efficient geometric primitives (tangent projection, binary search retraction) and integration with discrete DP/Gumbel sampling (Helcig et al., 1 May 2026).

5. Theoretical Guarantees and Complexity

RCO algorithms have been analyzed under both classical and nonsmooth settings:

Global convergence: Under Lipschitz, boundedness, and completeness assumptions, Riemannian gradient and Newton-type methods converge to critical points (or global optima in convex/PL settings) (Smith, 2014, Hu et al., 2017, Aspman et al., 2023). Augmented Lagrangian and interior point methods inherit global convergence properties from the Euclidean setting, with modifications to handle manifold geometry (Liu et al., 2019, Lai et al., 2022).
Rates: Local quadratic/superlinear rates for Newton/CG; sublinear $d$ 5 for (stochastic) gradient descent; optimal complexity for regularized cubic Newton; matching projection-free Frank-Wolfe rates (Hu et al., 2017, Deng et al., 2023, Weber et al., 2019).
Nonsmooth and stratified objectives: Tame/Whitney-stratifiable functions—arising in deep learning and sparse modeling—admit stratification-based convergence analyses for stochastic subgradient RCO (Aspman et al., 2023).
Penalty/exact equivalence: Sufficiently large penalty parameters in constraint dissolving or smoothing methods ensure one-to-one equivalence of stationary/local minimality points between original RCO and unconstrained reformulations (Xiao et al., 2022).

6. Applications Across Domains

Riemannian constrained optimization is central to:

Matrix and tensor decompositions: PCA on Grassmann and Stiefel manifolds; nonnegative and sparse factorization (Han et al., 18 May 2025, Naram et al., 2021, Esposito et al., 31 Mar 2025).
PDE-constrained optimal design and shape optimization: Sobolev-regularized diffeomorphism groups for robust interface evolution (Loayza-Romero et al., 28 Mar 2025).
Machine learning and neural networks: Orthogonal (O-FFN/O-ViT) and unitary networks, LLM quantization under budget constraints (Helcig et al., 1 May 2026).
Quantum information: Optimization over Stiefel, unitary, density matrix, and Choi manifolds for quantum gate decomposition and tomography (Luchnikov et al., 2020).
Extreme classification and clustering: Manifold embeddings for multi-label models and low-rank semi-definite kernel approximation (Naram et al., 2021).

Empirical results consistently show that exploiting the manifold geometry enables robust, scalable, and often faster solvers than Euclidean or projection-based analogs, particularly as problem size or nonconvexity increases.

7. Practical Implementations and Extensions

Broad adoption of RCO is facilitated by:

Software frameworks: Implementations in QGOpt (TensorFlow), Manopt, and specialized C++ or Python libraries support practitioner usage, including automatic differentiation for tangent/projection operations (Luchnikov et al., 2020, Han et al., 18 May 2025).
Algorithmic recommendations: Use QR/polar retractions and projection-based transports for matrix manifolds; employ preconditioning for ill-conditioned problems; select manifold-based approaches for constraints that are challenging for standard penalty or projection schemes (Mor et al., 2020, Mishra et al., 2014, Shustin et al., 2019).
Open directions: Further development includes scalable non-holonomic/vakonomic constraint handling, extension to stochastic settings with high variance, and tight complexity/rate analyses for large or data-driven manifolds (Loayza-Romero et al., 28 Mar 2025, Shustin et al., 2022).