Iterative Cayley Retraction Overview

Updated 17 June 2026

Iterative Cayley retraction is a computational method for enforcing orthonormality in Riemannian optimization by leveraging fixed-point iterations to avoid high-cost matrix inversions.
It recasts retraction as a fixed-point update, reducing complexity from O(n^3) to O(np^2) and proving highly effective in large-scale, non-Euclidean optimization tasks.
Adaptive and generalized Cayley parametrizations further enhance numerical stability and allow integration with standard Euclidean solvers in deep learning and matrix analysis.

The iterative Cayley retraction is a computational technique for Riemannian optimization over the Stiefel manifold that enables efficient enforcement of orthonormality constraints on matrix parameters. Leveraging the Cayley transform, this approach provides a numerically effective alternative to classical retraction methods such as QR or polar decompositions, with significant advantages in computational scaling, storage, and practical implementation. Iterative Cayley retractions have been further generalized and localized using adaptive and chart-based parametrizations to enhance both robustness and efficiency in large-scale and non-Euclidean optimization tasks.

1. Mathematical Foundations: The Stiefel Manifold and Retractions

The real Stiefel manifold $\operatorname{St}(n, p)$ is defined as the set of $n \times p$ matrices with orthonormal columns:

$\operatorname{St}(n, p) = \left\{ X \in \mathbb{R}^{n \times p} : X^\top X = I_p \right\}, \quad n \geq p.$

Tangent vectors at $X \in \operatorname{St}(n,p)$ satisfy $X^\top Z + Z^\top X = 0$ , and tangents can be written as $\Delta = W X$ for a skew-symmetric $W \in \mathbb{R}^{n \times n}$ . A retraction $R_X : T_X\operatorname{St}(n,p) \to \operatorname{St}(n,p)$ is a smooth map agreeing with the exponential map to first order but with lower computational complexity (2002.01113).

2. Classical and Iterative Cayley Retraction

Given a tangent vector $\eta \in T_X\operatorname{St}(n,p)$ , a canonical skew-symmetric generator is

$A = \eta X^\top - X \eta^\top,$

with $n \times p$ 0 and $n \times p$ 1. The Cayley retraction then writes, for step size $n \times p$ 2 and $n \times p$ 3,

$n \times p$ 4

guaranteeing $n \times p$ 5 and $n \times p$ 6. This closed form, however, involves the inversion of an $n \times p$ 7 matrix, imposing prohibitive $n \times p$ 8 costs for large $n \times p$ 9 (2002.01113).

The iterative Cayley retraction circumvents high-cost inversion by recasting the update as a fixed-point equation:

$\operatorname{St}(n, p) = \left\{ X \in \mathbb{R}^{n \times p} : X^\top X = I_p \right\}, \quad n \geq p.$ 0

which is solved for $\operatorname{St}(n, p) = \left\{ X \in \mathbb{R}^{n \times p} : X^\top X = I_p \right\}, \quad n \geq p.$ 1 via a small number $\operatorname{St}(n, p) = \left\{ X \in \mathbb{R}^{n \times p} : X^\top X = I_p \right\}, \quad n \geq p.$ 2 of inner iterations. This yields an $\operatorname{St}(n, p) = \left\{ X \in \mathbb{R}^{n \times p} : X^\top X = I_p \right\}, \quad n \geq p.$ 3-cost update by exploiting the low-rank structure of $\operatorname{St}(n, p) = \left\{ X \in \mathbb{R}^{n \times p} : X^\top X = I_p \right\}, \quad n \geq p.$ 4, making the method highly competitive for $\operatorname{St}(n, p) = \left\{ X \in \mathbb{R}^{n \times p} : X^\top X = I_p \right\}, \quad n \geq p.$ 5 compared to QR (≈ $\operatorname{St}(n, p) = \left\{ X \in \mathbb{R}^{n \times p} : X^\top X = I_p \right\}, \quad n \geq p.$ 6), polar/SVD ( $\operatorname{St}(n, p) = \left\{ X \in \mathbb{R}^{n \times p} : X^\top X = I_p \right\}, \quad n \geq p.$ 7), or closed-form Cayley ( $\operatorname{St}(n, p) = \left\{ X \in \mathbb{R}^{n \times p} : X^\top X = I_p \right\}, \quad n \geq p.$ 8) retractions. Empirically, two inner fixed-point steps ( $\operatorname{St}(n, p) = \left\{ X \in \mathbb{R}^{n \times p} : X^\top X = I_p \right\}, \quad n \geq p.$ 9) suffice for high accuracy (2002.01113).

3. Generalized and Adaptive Cayley Parametrizations

Recent research extends the Cayley retraction to generalized and adaptive schemes suitable for broader subclasses of Stiefel-type optimization problems. The generalized Cayley map uses a center point $X \in \operatorname{St}(n,p)$ 0 to parameterize open dense subsets of the Stiefel manifold:

$X \in \operatorname{St}(n,p)$ 1

where $X \in \operatorname{St}(n,p)$ 2 and $X \in \operatorname{St}(n,p)$ 3 are defined in terms of $X \in \operatorname{St}(n,p)$ 4 and $X \in \operatorname{St}(n,p)$ 5. The inverse $X \in \operatorname{St}(n,p)$ 6 provides a $X \in \operatorname{St}(n,p)$ 7 diffeomorphism from a vector space $X \in \operatorname{St}(n,p)$ 8 back to the manifold, acting as a retraction (Kume et al., 2023, Kume et al., 2023).

This strategy allows any Euclidean optimization algorithm to be applied in $X \in \operatorname{St}(n,p)$ 9. When iterates approach a singular-point set (where $X^\top Z + Z^\top X = 0$ 0), an adaptive scheme "re-centers" at a new $X^\top Z + Z^\top X = 0$ 1 chosen (e.g., via SVD of the new iterate) to maintain numerical stability and efficiency. Such adaptivity can eliminate slow convergence associated with poor center choice in naive Cayley parametrization (Kume et al., 2023).

4. Algorithmic Schemes and Computational Complexity

For iterative Cayley retraction, each update involves:

Momentum calculation: $X^\top Z + Z^\top X = 0$ 2
Tangent-space projection: compute $X^\top Z + Z^\top X = 0$ 3 from the skew-part of projected $X^\top Z + Z^\top X = 0$ 4
Step-size selection: $X^\top Z + Z^\top X = 0$ 5 ensuring contraction
$X^\top Z + Z^\top X = 0$ 6 fixed-point iterations: initialize $X^\top Z + Z^\top X = 0$ 7, update $X^\top Z + Z^\top X = 0$ 8, set $X^\top Z + Z^\top X = 0$ 9

For the generalized Cayley parametrization, the descent is performed in $\Delta = W X$ 0, with Armijo line search, and recentering when parameter norms indicate approach to chart singularities (Kume et al., 2023, Kume et al., 2023).

Per-iteration cost is $\Delta = W X$ 1—matching or improving upon QR and polar retractions, and avoiding explicit tangent-vector transports required by classical Riemannian CG/Quasi-Newton methods. The Cayley-parametrization strategy stays entirely in one vector space $\Delta = W X$ 2 between re-centering, simplifying the use of advanced Euclidean solvers such as accelerated gradients, conjugate gradient (CG), and BFGS without additional vector transport (Kume et al., 2023).

5. Convergence Properties and Theoretical Guarantees

For the iterative Cayley retraction, the fixed-point iteration exhibits contraction if the step size satisfies $\Delta = W X$ 3, and the error decays superlinearly: $\Delta = W X$ 4. Under a standard $\Delta = W X$ 5-Lipschitz gradient assumption, the Cayley SGD algorithm achieves a sublinear rate on the Stiefel manifold: $\Delta = W X$ 6 (2002.01113).

The adaptive and localized Cayley approaches extend these guarantees: under $\Delta = W X$ 7 smoothness of $\Delta = W X$ 8, Lipschitz gradients, and bounded step sizes, every limit point of the iterates is stationary on the Stiefel manifold, i.e., $\Delta = W X$ 9. This is a standard "liminf gradient $W \in \mathbb{R}^{n \times n}$ 0" stationarity result (Kume et al., 2023, Kume et al., 2023). The equivalence of stationarity conditions between the chart space and the manifold is formalized via gradient-chart correspondence theorems (Kume et al., 2023).

6. Empirical Performance and Applications

In practical deep learning and matrix optimization tasks, the iterative Cayley retraction offers competitive or superior empirical performance. For convolutional neural networks (CNNs) on CIFAR10/CIFAR100 using Wide ResNet-28-10, Cayley SGD and Cayley ADAM achieved errors of $W \in \mathbb{R}^{n \times n}$ 1 and $W \in \mathbb{R}^{n \times n}$ 2, respectively, with per-epoch cost approximately $W \in \mathbb{R}^{n \times n}$ 3-- $W \in \mathbb{R}^{n \times n}$ 4 seconds, considerably lower than QR, polar, or closed-form Cayley retractions (which ranged from $W \in \mathbb{R}^{n \times n}$ 5 to $W \in \mathbb{R}^{n \times n}$ 6 seconds per epoch). For unitary RNNs, iterative Cayley reduced the per-iteration training time from $W \in \mathbb{R}^{n \times n}$ 7 s (closed-form Cayley) to $W \in \mathbb{R}^{n \times n}$ 8-- $W \in \mathbb{R}^{n \times n}$ 9 s (iterative), maintaining comparable test accuracy ( $R_X : T_X\operatorname{St}(n,p) \to \operatorname{St}(n,p)$ 0) (2002.01113).

Generalized and adaptive Cayley schemes have demonstrated efficient optimization in eigen-basis extraction and other problems, with CPU time to convergence being roughly half that of QR/polar methods and 2--3× faster than Cayley-retraction in classical implementations. The adaptive recentering scheme effectively mitigates the slowdowns induced by chart singularities (Kume et al., 2023, Kume et al., 2023).

7. Connections, Extensions, and Implementation Considerations

The iterative Cayley and generalized Cayley parametrization frameworks provide a foundation for embedding momentum dynamics and vector transport directly into the retraction step. In particular, implicit vector transport is achieved by projecting the momentum update into the tangent space and applying the Cayley retraction, obviating the need for separate, explicit vector transport operations (2002.01113).

Further, the flexibility of these approaches enables "local trivialization" of the Stiefel manifold, allowing use of standard Euclidean optimizers transparently. Adaptive chart strategies can be implemented with negligible additional computational cost by leveraging SVD-based center selection ( $R_X : T_X\operatorname{St}(n,p) \to \operatorname{St}(n,p)$ 1). The avoidance of $R_X : T_X\operatorname{St}(n,p) \to \operatorname{St}(n,p)$ 2 operations and reduced per-iteration flops and storage recommend the iterative Cayley retraction and its generalizations for large-scale learning tasks with strict orthogonality constraints (Kume et al., 2023, Kume et al., 2023).

Markdown Report Issue Upgrade to Chat

References (3)

Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform (2020)

Adaptive Localized Cayley Parametrization for Optimization over Stiefel Manifold (2023)

Generalized Left-Localized Cayley Parametrization for Optimization with Orthogonality Constraints (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Iterative Cayley Retraction.

Iterative Cayley Retraction Overview

1. Mathematical Foundations: The Stiefel Manifold and Retractions

2. Classical and Iterative Cayley Retraction

3. Generalized and Adaptive Cayley Parametrizations

4. Algorithmic Schemes and Computational Complexity

5. Convergence Properties and Theoretical Guarantees

6. Empirical Performance and Applications

7. Connections, Extensions, and Implementation Considerations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Iterative Cayley Retraction Overview

1. Mathematical Foundations: The Stiefel Manifold and Retractions

2. Classical and Iterative Cayley Retraction

3. Generalized and Adaptive Cayley Parametrizations

4. Algorithmic Schemes and Computational Complexity

5. Convergence Properties and Theoretical Guarantees

6. Empirical Performance and Applications

7. Connections, Extensions, and Implementation Considerations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research