Riemannian Preconditioning in Manifold Optimization

Updated 24 February 2026

Riemannian preconditioning is a metric design that incorporates local second-order information to achieve Newton-like convergence in manifold optimization.
It employs block-diagonal and structure-aware Hessian approximations to effectively scale gradient-based methods and reduce computational overhead.
This approach has proven practical in low-rank tensor and matrix recovery problems, delivering significant acceleration and robustness even with overparameterized models.

Riemannian preconditioning is the systematic design of a Riemannian metric in manifold optimization frameworks to accelerate convergence by embedding local second-order information—typically block-diagonal or structure-aware Hessian approximations—into the manifold geometry. By replacing the standard Euclidean or canonical metric with a task-adapted metric that mimics curvature, Riemannian preconditioning equips gradient-based algorithms with effective scaling properties and Newton-like behavior at significantly reduced computational cost. The approach has been extensively developed for tensor completion, low-rank matrix/tensor recovery, optimization on product quotients and the Stiefel manifold, as well as ground states in infinite-dimensional PDE-constrained systems.

1. Principles and Motivations: Preconditioning Through Metric Design

The role of Riemannian preconditioning is to improve optimization within a constraint manifold (often a product or quotient space) by selecting a metric that makes first-order methods behave more like Newton-type algorithms. The preconditioning metric is constructed using curvature information from the objective, such as block-diagonal Hessian approximations, Gram matrices, or left/right covariance factors. This connects the Riemannian gradient step to the solution of an associated Sequential Quadratic Programming (SQP) subproblem, so that steepest descent becomes a quasi-Newton method in the appropriate limit (Mishra et al., 2014). In practice, the metric is chosen to closely approximate the local Hessian's dominant blocks while remaining easily invertible to maintain computational efficiency (Dong et al., 2021).

Within equality-constrained or quotient frameworks, the metric can be defined as

$g_x(\xi,\eta) = \langle\xi, D^2\mathcal L(x, \lambda_x)[\eta]\rangle,$

where $\mathcal L$ is the Lagrangian and $\lambda_x$ the multiplier at $x$ . In product-manifold settings, the metric takes block-diagonal form to capture the structure of factorized models (Gao et al., 2023).

2. Metric Construction: Block-Diagonal and Structure-Aware Schemes

A central feature of Riemannian preconditioning is the design of the metric based on approximate Hessians. The commonly adopted strategy is to use a block-diagonal metric where each block corresponds to a natural variable (factor) or tangent space component.

For example, in low-rank tensor completion via polyadic decomposition (Dong et al., 2021), the product manifold is $\mathcal M = \mathbb{R}^{n_1 \times r} \times \cdots \times \mathbb{R}^{n_d \times r}$ , and the metric is

$\langle \xi, \eta \rangle_X = \sum_{i=1}^d \operatorname{tr}[\xi_i^\top G_i(X) \eta_i]$

with

$G_i(X) = (X^{(-i)})^\top X^{(-i)} + \delta I_r,$

where $X^{(-i)}$ is the Khatri–Rao product of all factor matrices except $X_i$ . This captures the curvature in the $X_i$ -variable via Gram matrices. The small shift $\delta$ ensures positive definiteness, robustifying the method against rank-deficient blocks.

More generally, for a product manifold $M = M_1 \times \cdots \times M_K$ , the preconditioned metric takes the form

$g_x(\xi, \eta) = \sum_{k=1}^K \langle \xi_k, P_k(x)[\eta_k] \rangle,$

where $P_k(x)$ is designed to approximate the block-diagonal Hessian block for block $k$ (Gao et al., 2023).

3. Algorithmic Realizations: Gradient, Conjugate Gradient, and Trust-Region

The incorporation of the preconditioning metric affects all geometric algorithmic primitives:

Riemannian gradient: For block-diagonal metrics, the Riemannian gradient direction for each block $i$ is given by

$(\operatorname{grad} f(X))_i = [\partial_{X_i} f] \, G_i(X)^{-1},$

ensuring that each component is scaled inversely to its local curvature (Dong et al., 2021).

Retraction: When the manifold is a vector space, the retraction is trivial (the identity map); otherwise, standard matrix factorizations (e.g., QR or polar decompositions) are used to ensure feasibility after each step (Shustin et al., 2019).
Vector transport: For most block-metric or ambient-vector-space settings, the vector transport simplifies to the identity map, reducing per-iteration overhead in first-order methods.
Line-search and stepsizes: Exact line-search or structure-aware Barzilai–Borwein rules can be employed, leveraging closed-form polynomials in least-squares settings (Dong et al., 2021).
Conjugate gradient and higher-order methods: Preconditioned metrics also enhance Riemannian nonlinear conjugate gradient and trust-region schemes, retaining Newton-like convergence rates under suitable regularity (Kasai et al., 2015, Shustin et al., 2019).

4. Theoretical Guarantees: Conditioning and Convergence Rates

Riemannian preconditioning reduces the condition number of the Riemannian Hessian at stationary points, which directly controls the convergence rate of first-order algorithms. Concretely:

Local linear convergence: Let $\kappa_g$ denote the condition number of the Riemannian Hessian under metric $g$ at the minimizer $x^*$ . Then Riemannian gradient descent (RGD) with Armijo or exact line-search satisfies

$f(x^{(t)}) - f(x^*) \leq \left(1 - \frac{c}{\kappa_g}\right) (f(x^{(t-1)}) - f(x^*))$

for some constant $c > 0$ (Gao et al., 2023).

Łojasiewicz-type global convergence: In analytic settings, the combination of sufficient decrease, Lipschitz continuous gradients, and a Łojasiewicz gradient inequality yields full convergence to a stationary point with sublinear or linear rates depending on the Łojasiewicz exponent (Dong et al., 2021).
Preconditioner robustness: The use of the $\delta$ -shift and Gram matrices ensures invariance to rank-deficient factors, enabling convergence even when the local block Hessians become nearly singular (Dong et al., 2021).

5. Practical Applications: Empirical Acceleration and Robustness

The practical efficacy of Riemannian preconditioning is well-documented:

Low-rank tensor and matrix completion: Empirical studies show 4×–15× acceleration (in time and memory) for low-rank tensor completion problems compared to unpreconditioned Riemannian GD/CG and state-of-the-art methods such as geomCG (Dong et al., 2021). The metric construction also yields robust performance under overestimated rank, with recovery quality stable even for $r \gg r_{\text{true}}$ .
Overparameterization tolerance: Algorithms with structure-aware preconditioning metrics display flat recovery error with respect to overspecified rank parameters, while unpreconditioned schemes often stagnate or slow down (Dong et al., 2021).
Tensor decompositions: Similar metric prescriptions for Tucker and tensor ring decompositions yield order-of-magnitude speedups in convergence and lower sample complexity in phase transitions (Kasai et al., 2016, Gao et al., 2023).
Product manifolds and matrix manifolds: The block-wise metric strategy extends to canonical correlation analysis, SVD, and generalized Stiefel/Grassmann settings, reducing Hessian condition numbers by one to two orders of magnitude and yielding 5–20× reduction in iteration counts (Shustin et al., 2019, Gao et al., 2023).

6. Generalization and Extension to Other Manifold Structures

The design principles underlying Riemannian preconditioning generalize to a wide class of constrained manifold optimization settings:

Recipe: Identify natural coordinate blocks (e.g., factor matrices or core tensors), approximate or compute block Hessians (typically via Gram matrices), apply positive definiteness shifts, and define the metric block-diagonally. This structural approach is applicable to tensor trains, hierarchical Tucker, bi-factored bilinear models, and matrix manifolds (e.g., fixed-rank or orthogonality-constrained cases) (Dong et al., 2021, Gao et al., 2023, Mishra et al., 2014).
Theoretical guarantees: Under standard regularity (smoothness and positive definiteness of block Hessians on tangent spaces), all Riemannian geometric ingredients and first-order convergence theory extend seamlessly to these generalized settings.
Adaptivity: The choice of metric adapts automatically to local curvature and factor degeneracy (through Gram matrices and rank adaptation), making the preconditioners robust to practical pathologies in large-scale optimization.

7. Summary Table: Key Elements of Riemannian Preconditioning

Component	Construction/Role	Source Papers
Metric definition	Block-diagonal from Hessian blocks, Gram matrices, or structure-aware weights	(Dong et al., 2021, Gao et al., 2023, Mishra et al., 2014)
Riemannian gradient	Blockwise scaling of Euclidean gradient by local Gram or Hessian inverses	(Dong et al., 2021)
Robustness shift	Positive definite shift δI for degenerate/ill-posed blocks	(Dong et al., 2021, Kasai et al., 2016)
Local convergence rate	Condition-number controlled, linear to superlinear	(Gao et al., 2023)
Product/quotient manifold scope	Extensible to general product/quotient settings	(Gao et al., 2023, Mishra et al., 2014)
Empirical speedup and tolerance	10× acceleration, stability under overparameterization	(Dong et al., 2021)

Riemannian preconditioning thereby provides a principled, structure-exploiting, and computationally scalable approach to manifold optimization, effectively bridging first-order and second-order methods and enabling robust solution of challenging problems in tensor and matrix recovery, signal processing, and beyond (Dong et al., 2021, Gao et al., 2023, Kasai et al., 2016).