Papers
Topics
Authors
Recent
Search
2000 character limit reached

Scaled Cayley Transform

Updated 27 May 2026
  • Scaled Cayley Transform is a family of rational maps that parameterize Lie groups by scaling the classical Cayley transform to avoid singularities at -1 eigenvalues.
  • It enables globally well-defined, computationally efficient, and numerically stable constructions on orthogonal, unitary, and Stiefel manifolds, crucial for optimization and neural network architecture.
  • Its versatile applications in Riemannian optimization, recurrent neural networks, and numerical PDE solvers underpin both theoretical advancements and practical, high-performance computational algorithms.

The scaled Cayley transform consists of a family of matrix or operator-valued rational maps that enable globally well-defined, computationally efficient, and numerically stable parameterizations of Lie groups—such as the orthogonal group, the Stiefel manifold, and the unitary group—by “scaling” or “pivoting” the classical Cayley transform. Unlike the unscaled Cayley chart, which fails at matrices with −1 eigenvalues, the scaled Cayley transform multiplies or twists by an auxiliary diagonal or phase factor to avoid chart singularities. This technique underpins state-of-the-art algorithms in Riemannian optimization, recurrent neural networks, operator theory, and numerical PDEs, ensuring exact preservation of group or manifold constraints while maintaining computational tractability and robust optimization dynamics.

1. Motivation and Classical Limitations of the Cayley Transform

The classical Cayley transform provides a bijective, rational parameterization between real skew-symmetric matrices AA and real orthogonal matrices W=(I+A)1(IA)W = (I+A)^{-1}(I-A), but this correspondence fails when the orthogonal matrix WW possesses eigenvalue 1-1, rendering I+WI+W singular. As O(n)O(n) and related manifolds contain elements arbitrarily close to such singularities, relying solely on the unscaled Cayley transform generates gaps in coverage and severe ill-conditioning near 1-1-eigenvalue regimes (Helfrich et al., 2017, Biborski, 22 Jan 2026). In practical optimization and learning algorithms, this yields instability, unbounded gradient flow, and difficulties in constructing globally valid manifold retractions.

The scaled Cayley transform overcomes these deficiencies by introducing a suitably chosen “scaling” factor that remaps the domain of the transform and regularizes its spectrum, producing an everywhere-well-defined, uniform parameterization for the target group or manifold.

2. Scaled Cayley Transform on O(n)O(n): Construction and Algorithmic Properties

Let UO(n)U \in O(n) be a real orthogonal matrix. The scaled (“pivoted”) Cayley transform first multiplies UU by a diagonal signature matrix W=(I+A)1(IA)W = (I+A)^{-1}(I-A)0, W=(I+A)1(IA)W = (I+A)^{-1}(I-A)1, to produce W=(I+A)1(IA)W = (I+A)^{-1}(I-A)2 with the property that W=(I+A)1(IA)W = (I+A)^{-1}(I-A)3 is not an eigenvalue. The Cayley transform is then defined as

W=(I+A)1(IA)W = (I+A)^{-1}(I-A)4

for a skew-symmetric generator W=(I+A)1(IA)W = (I+A)^{-1}(I-A)5 (Biborski, 22 Jan 2026, Helfrich et al., 2017). Existence and uniqueness of W=(I+A)1(IA)W = (I+A)^{-1}(I-A)6 for any W=(I+A)1(IA)W = (I+A)^{-1}(I-A)7 are ensured through a constructive W=(I+A)1(IA)W = (I+A)^{-1}(I-A)8 Gaussian elimination procedure that, by appropriate choice of each W=(I+A)1(IA)W = (I+A)^{-1}(I-A)9, keeps the pivots WW0 at every step, avoiding near-singularity.

Quantitative bounds are established: the spectral radius of WW1 satisfies WW2, controlled independently of WW3. The method preserves determinant properties along the Gaussian elimination chain and yields a well-conditioned linear solve for WW4.

This pivoted Cayley representation is readily incorporated in optimization workflows: for gradient steps parametrized in the Lie algebra, one transforms to and from the skew-symmetric generator via the scaled Cayley chart, with pivoting as needed to preserve stability under line-search or trust-region schemes (Biborski, 22 Jan 2026).

3. Scaled Cayley Transform in Riemannian Optimization on the Stiefel Manifold

On the Stiefel manifold WW5, imposing orthonormality constraints in optimization requires projecting updates onto the tangent space and mapping resultant steps back to the manifold. Here, the scaled Cayley transform enables an efficient first-order retraction: WW6 where WW7 and WW8 (2002.01113). Rather than inverting a full matrix for each step (cost WW9), one may implement the retraction by a fixed-point iteration: 1-10 where typically 1-11 or 1-12 iterations suffice for practical accuracy. The map 1-13 is a contraction for 1-14, and after 1-15 steps 1-16 (2002.01113).

Embedding this retraction in stochastic gradient methods (Cayley-SGD, Cayley-ADAM) yields provable convergence guarantees: for 1-17-Lipschitz gradients, 1-18, and 1-19, one obtains

I+WI+W0

Crucially, the retraction acts as its own (implicit) vector transport: momentum or auxiliary vectors are projected to the tangent space, then advanced by the same Cayley step, obviating explicit parallel transport (2002.01113).

4. Scaled Cayley Parameterizations in Neural Network Architectures

Constrained recurrent neural networks (RNNs), such as scoRNN and scuRNN, employ the scaled Cayley transform to parameterize orthogonal or unitary recurrent matrices. For the real case,

I+WI+W1

where I+WI+W2 is a diagonal matrix with entries I+WI+W3 (Helfrich et al., 2017, Maduranga et al., 2018). This parametrization covers all I+WI+W4, maintaining exact orthogonality while sidestepping the eigenvalue I+WI+W5 obstruction inherent to the classical Cayley transform.

For complex unitary RNNs,

I+WI+W6

with I+WI+W7 and I+WI+W8 diagonal unitary. Unlike in the real case, the phases I+WI+W9 can be optimized through gradient descent, removing the need for discrete hyperparameters or manual signature selection. This approach delivers exact unitarity, robust gradient flow, and state-of-the-art empirical performance for long-range-sequence modeling (Maduranga et al., 2018).

Summary of key properties in these architectures:

Model Scaling Parameter Space Coverage
Classical Cayley None O(n)O(n)0 O(n)O(n)1
scoRNN O(n)O(n)2 diag O(n)O(n)3 O(n)O(n)4 O(n)O(n)5
scuRNN O(n)O(n)6 unitary diag O(n)O(n)7 O(n)O(n)8

5. Scaled Cayley Transforms in Operator Theory and Numerical Analysis

In operator-theoretic formulations of wave problems, particularly the Convergent Born Series (CBS) for Helmholtz-type equations, the scaled Cayley transform serves as a contractive rational map for self-adjoint background operators. For O(n)O(n)9 self-adjoint, the scaled (real-shift) Cayley transform is defined by

1-10

Every 1-11 is mapped to a strict contraction 1-12. The invertibility and contractivity of 1-13 facilitate robust convergence conditions for preconditioned fixed-point iterations, e.g., in Lippmann–Schwinger solvers (Jakobsen, 22 Apr 2026). The resolvent identity 1-14 ties convergence rates and smoothing directly to the spectrum of the scaled Cayley map.

The presence of complex-valued absorbing layers further increases the spectral gap and strengthens contraction without sacrificing the self-adjoint structure of 1-15. Numerical experiments confirm geometric convergence and robustness to heterogeneous strong-contrast media (Jakobsen, 22 Apr 2026).

6. Scaled Cayley Maps for Unitary and Special Unitary Groups

For compact Lie groups beyond 1-16, scaled Cayley maps are constructed to preserve group-specific invariants. In 1-17 dynamics (Schäfers et al., 2024), the modified Cayley map takes the form

1-18

where 1-19 is chosen to ensure O(n)O(n)0, defined via an explicit algebraic relation involving O(n)O(n)1 and O(n)O(n)2. This map is local diffeomorphism, preserves volume, and is time-reversible, guaranteeing suitability for geometric integrators and Hamiltonian Monte Carlo algorithms. The construction generalizes to O(n)O(n)3 by solving the appropriate phase constraint (Schäfers et al., 2024).

7. Numerical, Algorithmic, and Theoretical Impact

The scaled Cayley transform yields:

  • Uniformly bounded, well-conditioned parametrizations avoiding chart singularities near O(n)O(n)4 eigenvalues (Biborski, 22 Jan 2026, Helfrich et al., 2017).
  • Numerically stable O(n)O(n)5 algorithms for inversion and update steps, leveraging pivoting to guarantee robust Schur complements and determinant preservation (Biborski, 22 Jan 2026).
  • Simple and efficient retractions for Riemannian optimization on matrix manifolds, directly enabling fast, convergent manifold-adapted SGD and Adam variants (2002.01113).
  • Full coverage of O(n)O(n)6 or O(n)O(n)7 in neural network parameterizations, exact preservation of group-constraints, stable gradients, and improved empirical performance on sequence tasks (Helfrich et al., 2017, Maduranga et al., 2018).
  • Enhanced contractivity and convergence criteria for iterative solvers in high-frequency and heterogeneous operator regimes (Jakobsen, 22 Apr 2026).
  • Exact volume-preservation and symmetry properties in structure-preserving numerical integration for gauge theory simulations (Schäfers et al., 2024).

A plausible implication is that the scaled Cayley transform will continue to serve as a foundational tool for geometry-aware numerical algorithms, facilitating both theoretical advances and large-scale machine learning and scientific computing deployments.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Scaled Cayley Transform.