Rank-One Riemannian Subspace Descent

Updated 29 January 2026

The paper introduces rank-one Riemannian subspace descent, an optimization method that iteratively moves along one-dimensional tangent directions on matrix manifolds to solve SPD-constrained problems.
It exploits the matrix structure through rank-one updates and precise step-size selection to drastically reduce computational complexity compared to full-gradient methods.
The algorithm guarantees global convergence with practical performance benefits in high-dimensional applications such as covariance estimation, Riccati equations, and matrix square-root computations.

A rank-one Riemannian subspace descent algorithm is an optimization method that advances iteratively along one-dimensional subspaces (rank-one tangent directions) on matrix manifolds, specifically tailored to problems with symmetric positive definite (SPD) or related matrix constraints. It generalizes classic coordinate descent to curved spaces, offering substantial computational savings in large-scale nonlinear matrix optimization. The algorithm features a disciplined blend of Riemannian geometry, matrix structure exploitation, and precise step-size selection, yielding global convergence with per-iteration costs significantly lower than traditional Riemannian gradient descent methods, especially for high-dimensional SPD-constrained problems (Darmwal et al., 2023, Darmwal et al., 21 Jan 2026, Han et al., 2024, Gutman et al., 2019, Mishra et al., 2013).

1. Geometric Framework and Problem Class

Rank-one Riemannian subspace descent is formulated on matrix manifolds such as the SPD cone $\mathcal{P}_n = \{X \in \mathbb{R}^{n \times n} : X = X^\top \succ 0\}$ or more general matrix geometries (e.g., Stiefel or Grassmann). The typical optimization problem is

$\min_{X \in \mathcal{M}} f(X)$

where $f$ is smooth and $\mathcal{M}$ denotes a Riemannian manifold, frequently $\mathcal{P}_n$ , Stiefel, or a fixed-rank submanifold. In control, filtering, and machine learning, $f$ is often a least-squares penalty encoding a nonlinear matrix equation—such as the algebraic Riccati equation, covariance estimation, or matrix square root problems (Darmwal et al., 21 Jan 2026, Darmwal et al., 2023, Mishra et al., 2013, Han et al., 2024).

The manifold is endowed with an affine-invariant Riemannian metric

$\langle \xi, \eta \rangle_X = \mathrm{tr}(X^{-1} \xi X^{-1} \eta)$

for tangent vectors $\xi, \eta \in T_X \mathcal{P}_n$ , inducing geodesics and retractions compatible with matrix structure. The Riemannian gradient is related to the Euclidean gradient by $\mathrm{grad}^R f(X) = X \nabla f(X) X$ .

2. Algorithmic Structure and Rank-One Updates

Each iteration proceeds as follows:

Subspace Selection: Choose a one-dimensional tangent direction—typically a rank-one symmetric matrix—either deterministically (e.g., cycling through canonical directions, selecting the direction of largest gradient component) or stochastically (randomized from an orthonormal basis).
Gradient Projection: Project the Riemannian gradient onto the chosen subspace. In the SPD setting, basis directions are constructed from the Cholesky factor $B$ via $G_{ij}^R(X) = B E_{ij} B^\top$ with $E_{ij}$ low-rank symmetric matrices (Darmwal et al., 2023). In more general manifolds, the tangent direction mirrors block or coordinate directions (Gutman et al., 2019, Han et al., 2024).
Update Rule: Apply a retraction (often the exponential map) to move along the rank-one direction, with the update taking the form

$X_{t+1} = \mathrm{Exp}_{X_t}(-\alpha\, \beta_{ij}\, G_{ij}^R(X_t))$

For many matrix manifolds, closed-form or efficiently computable updates exploit the sparsity and rank structure of the chosen direction, e.g.,

$X_{t+1} = B_t\, \exp(-\alpha \beta_{ij} E_{ij})\, B_t^\top$

or, for general nonlinear equations,

$X_{t+1} = X_t + (e^{-\beta \lambda} - 1) v v^\top$

where $v$ is an eigenvector of a transformed Riemannian gradient (Darmwal et al., 21 Jan 2026, Darmwal et al., 2023, Mishra et al., 2013).

Step-Size Selection: Step sizes may be fixed, chosen by adaptive rules, or determined by exact line search along the geodesic. For quadratic or polynomial forms in the update parameter (as in Riccati or matrix square-root problems), closed-form or low-cost minimization is feasible (Mishra et al., 2013, Darmwal et al., 21 Jan 2026, Darmwal et al., 2023).

3. Computational Complexity

The principal computational advantage is the per-iteration cost reduction relative to full-gradient methods. In classical Riemannian gradient descent, an iteration requires dense matrix-matrix operations at $O(n^3)$ complexity. In contrast, a rank-one Riemannian subspace descent step achieves:

$O(n)$ per update in specialized function classes for SPD problems, via updates of the Cholesky factor and sparse projections (Darmwal et al., 2023).
$O(n^2)$ to $O(n^2 \log n)$ per update for nonlinear matrix equations, dominated by matrix-vector products and rank-one updates to factorizations or inverses (Darmwal et al., 21 Jan 2026).
For coordinate descent on Stiefel or Grassmann manifolds, the update exploits projector structure to limit computation to a single column or row (Han et al., 2024).

Iterate counts scale as $O(n^2\kappa \log(1/\epsilon))$ for geodesically strongly convex objectives on SPD manifolds ( $\kappa=L/\mu$ ), where each iteration is much cheaper than its full-gradient counterpart. For general smooth but nonconvex functions, stationarity can be reached in $O(1/\sqrt{K})$ steps, matching known rates for coordinate descent in Euclidean space (Gutman et al., 2019, Darmwal et al., 2023).

4. Convergence Theory

Rigorous convergence guarantees follow from the geometric smoothness and choice of subspace selection rule:

For deterministic updates, a gap-ensuring property (each cycle sufficiently spans the tangent space) is required. For randomized rules, an average-norm (C-norm) condition ensures that every tangent direction receives nontrivial projected gradient magnitude (Gutman et al., 2019).
When $f$ is geodesically $\mu$ -strongly convex and has $L$ -Lipschitz Riemannian gradient, expectation bounds of the form

$\mathbb{E}[f(X_{t+1}) - f(X^*)] \leq \left(1 - \frac{\mu}{4dL}\right) [f(X_t) - f(X^*)]$

hold, with $d$ the manifold intrinsic dimension. Thus, global linear convergence is attainable under standard assumptions (Darmwal et al., 21 Jan 2026, Darmwal et al., 2023, Han et al., 2024).

For nonconvex but $L$ -smooth objectives, the algorithm achieves asymptotic stationarity; the minimal expected gradient norm over $K$ iterations decays as $O(1/\sqrt{K})$ (Gutman et al., 2019).

5. Algorithmic Variants and Practical Implementation

Rank-one Riemannian subspace descent encompasses several design choices:

Greedy vs. Randomized Subspace Selection: Greedy selection (choosing the largest coefficient) improves per-iteration objective reduction but at increased computation, while random selection is cheap and easier to parallelize (Darmwal et al., 2023, Darmwal et al., 21 Jan 2026).
Retractions: For most matrix manifolds, the exponential map is computable in closed form along rank-one directions; approximations such as sphere or QR retractions can provide further savings (Han et al., 2024).
Line Search and Step-Size: For many objectives, the update along a rank-one direction reduces to minimizing a univariate function, often a low-degree polynomial or rational function. When not directly available, Armijo-type backtracking is sufficient and inexpensive (Mishra et al., 2013, Darmwal et al., 21 Jan 2026, Darmwal et al., 2023).

Practical implementations maintain efficient data structures for factorizations (Cholesky, QR), residuals, and intermediate matrix products, enabling updates via rank-one Cholesky or Sherman-Morrison formulas. Detailed implementations for covariance estimation, Riccati equations, and matrix square roots are available with open-source MATLAB code (Darmwal et al., 21 Jan 2026, Darmwal et al., 2023).

6. Applications and Empirical Results

Rank-one Riemannian subspace descent is applicable to:

Computation of low-rank or SPD solutions to algebraic Riccati equations (CARE, DARE), Lyapunov equations, and nonlinear filtering equations, where it finds solutions at lower ranks or for higher dimensions than standard solvers (Mishra et al., 2013, Darmwal et al., 21 Jan 2026).
Large-scale covariance estimation for Gaussian models, matrix square-root computations, kernel matrix learning, and parameter estimation in Gaussian mixture models (Darmwal et al., 2023).
Low-rank or manifold-constrained optimization in machine learning (Stiefel, Grassmann, hyperbolic, symplectic, etc.) (Han et al., 2024, Gutman et al., 2019).

Empirical studies demonstrate:

On benchmarks such as 1D Laplace, Toeplitz, and heat-equation Riccati problems, the method achieves residuals of $\|R(X_r)\|_F/\|C^TC\|_F \leq 1.2 \times 10^{-1}$ for $r=1$ to $1.2 \times 10^{-3}$ for $r=4$ , at significantly lower ranks than truncated standard solvers (Mishra et al., 2013).
For problem sizes up to $n=10\,000$ , competitors such as MATLAB's \texttt{icare} or structure-preserving doubling run out of memory or are infeasible, while rank-one Riemannian subspace descent continues to deliver solutions in practical runtimes (Darmwal et al., 21 Jan 2026).
For orthogonal Procrustes and related problems on $O_n$ , the algorithm closes most of the optimality gap in far fewer cycles than full Riemannian gradient descent (Gutman et al., 2019).

Rank-one Riemannian subspace descent generalizes to block-coordinate or multi-directional subspace updates, and adapts readily to numerous matrix manifold settings via appropriate construction of tangent directions, projections, and retractions (Darmwal et al., 2023, Han et al., 2024). Block updates can further accelerate convergence at the cost of computational complexity per step.

A major advantage is scalability to very large matrices, provided that the objective function and its derivatives admit low-cost evaluation along rank-one updates. However, for problems lacking such structure, or for objectives with dense gradient computations, the method may lose its computational edge.

In summary, rank-one Riemannian subspace descent provides a mathematically rigorous, computationally efficient, and broadly applicable scheme for high-dimensional manifold-constrained minimization, particularly excelling in large-scale matrix nonlinear equations, low-rank Riccati problems, and modern machine learning applications with manifold constraints (Darmwal et al., 21 Jan 2026, Darmwal et al., 2023, Mishra et al., 2013, Gutman et al., 2019, Han et al., 2024).