Iterative Cayley Retraction Overview
- Iterative Cayley retraction is a computational method for enforcing orthonormality in Riemannian optimization by leveraging fixed-point iterations to avoid high-cost matrix inversions.
- It recasts retraction as a fixed-point update, reducing complexity from O(n^3) to O(np^2) and proving highly effective in large-scale, non-Euclidean optimization tasks.
- Adaptive and generalized Cayley parametrizations further enhance numerical stability and allow integration with standard Euclidean solvers in deep learning and matrix analysis.
The iterative Cayley retraction is a computational technique for Riemannian optimization over the Stiefel manifold that enables efficient enforcement of orthonormality constraints on matrix parameters. Leveraging the Cayley transform, this approach provides a numerically effective alternative to classical retraction methods such as QR or polar decompositions, with significant advantages in computational scaling, storage, and practical implementation. Iterative Cayley retractions have been further generalized and localized using adaptive and chart-based parametrizations to enhance both robustness and efficiency in large-scale and non-Euclidean optimization tasks.
1. Mathematical Foundations: The Stiefel Manifold and Retractions
The real Stiefel manifold is defined as the set of matrices with orthonormal columns:
Tangent vectors at satisfy , and tangents can be written as for a skew-symmetric . A retraction is a smooth map agreeing with the exponential map to first order but with lower computational complexity (2002.01113).
2. Classical and Iterative Cayley Retraction
Given a tangent vector , a canonical skew-symmetric generator is
with 0 and 1. The Cayley retraction then writes, for step size 2 and 3,
4
guaranteeing 5 and 6. This closed form, however, involves the inversion of an 7 matrix, imposing prohibitive 8 costs for large 9 (2002.01113).
The iterative Cayley retraction circumvents high-cost inversion by recasting the update as a fixed-point equation:
0
which is solved for 1 via a small number 2 of inner iterations. This yields an 3-cost update by exploiting the low-rank structure of 4, making the method highly competitive for 5 compared to QR (≈6), polar/SVD (7), or closed-form Cayley (8) retractions. Empirically, two inner fixed-point steps (9) suffice for high accuracy (2002.01113).
3. Generalized and Adaptive Cayley Parametrizations
Recent research extends the Cayley retraction to generalized and adaptive schemes suitable for broader subclasses of Stiefel-type optimization problems. The generalized Cayley map uses a center point 0 to parameterize open dense subsets of the Stiefel manifold:
1
where 2 and 3 are defined in terms of 4 and 5. The inverse 6 provides a 7 diffeomorphism from a vector space 8 back to the manifold, acting as a retraction (Kume et al., 2023, Kume et al., 2023).
This strategy allows any Euclidean optimization algorithm to be applied in 9. When iterates approach a singular-point set (where 0), an adaptive scheme "re-centers" at a new 1 chosen (e.g., via SVD of the new iterate) to maintain numerical stability and efficiency. Such adaptivity can eliminate slow convergence associated with poor center choice in naive Cayley parametrization (Kume et al., 2023).
4. Algorithmic Schemes and Computational Complexity
For iterative Cayley retraction, each update involves:
- Momentum calculation: 2
- Tangent-space projection: compute 3 from the skew-part of projected 4
- Step-size selection: 5 ensuring contraction
- 6 fixed-point iterations: initialize 7, update 8, set 9
For the generalized Cayley parametrization, the descent is performed in 0, with Armijo line search, and recentering when parameter norms indicate approach to chart singularities (Kume et al., 2023, Kume et al., 2023).
Per-iteration cost is 1—matching or improving upon QR and polar retractions, and avoiding explicit tangent-vector transports required by classical Riemannian CG/Quasi-Newton methods. The Cayley-parametrization strategy stays entirely in one vector space 2 between re-centering, simplifying the use of advanced Euclidean solvers such as accelerated gradients, conjugate gradient (CG), and BFGS without additional vector transport (Kume et al., 2023).
5. Convergence Properties and Theoretical Guarantees
For the iterative Cayley retraction, the fixed-point iteration exhibits contraction if the step size satisfies 3, and the error decays superlinearly: 4. Under a standard 5-Lipschitz gradient assumption, the Cayley SGD algorithm achieves a sublinear rate on the Stiefel manifold: 6 (2002.01113).
The adaptive and localized Cayley approaches extend these guarantees: under 7 smoothness of 8, Lipschitz gradients, and bounded step sizes, every limit point of the iterates is stationary on the Stiefel manifold, i.e., 9. This is a standard "liminf gradient 0" stationarity result (Kume et al., 2023, Kume et al., 2023). The equivalence of stationarity conditions between the chart space and the manifold is formalized via gradient-chart correspondence theorems (Kume et al., 2023).
6. Empirical Performance and Applications
In practical deep learning and matrix optimization tasks, the iterative Cayley retraction offers competitive or superior empirical performance. For convolutional neural networks (CNNs) on CIFAR10/CIFAR100 using Wide ResNet-28-10, Cayley SGD and Cayley ADAM achieved errors of 1 and 2, respectively, with per-epoch cost approximately 3--4 seconds, considerably lower than QR, polar, or closed-form Cayley retractions (which ranged from 5 to 6 seconds per epoch). For unitary RNNs, iterative Cayley reduced the per-iteration training time from 7 s (closed-form Cayley) to 8--9 s (iterative), maintaining comparable test accuracy (0) (2002.01113).
Generalized and adaptive Cayley schemes have demonstrated efficient optimization in eigen-basis extraction and other problems, with CPU time to convergence being roughly half that of QR/polar methods and 2--3× faster than Cayley-retraction in classical implementations. The adaptive recentering scheme effectively mitigates the slowdowns induced by chart singularities (Kume et al., 2023, Kume et al., 2023).
7. Connections, Extensions, and Implementation Considerations
The iterative Cayley and generalized Cayley parametrization frameworks provide a foundation for embedding momentum dynamics and vector transport directly into the retraction step. In particular, implicit vector transport is achieved by projecting the momentum update into the tangent space and applying the Cayley retraction, obviating the need for separate, explicit vector transport operations (2002.01113).
Further, the flexibility of these approaches enables "local trivialization" of the Stiefel manifold, allowing use of standard Euclidean optimizers transparently. Adaptive chart strategies can be implemented with negligible additional computational cost by leveraging SVD-based center selection (1). The avoidance of 2 operations and reduced per-iteration flops and storage recommend the iterative Cayley retraction and its generalizations for large-scale learning tasks with strict orthogonality constraints (Kume et al., 2023, Kume et al., 2023).