Scaled Cayley Transform

Updated 27 May 2026

Scaled Cayley Transform is a family of rational maps that parameterize Lie groups by scaling the classical Cayley transform to avoid singularities at -1 eigenvalues.
It enables globally well-defined, computationally efficient, and numerically stable constructions on orthogonal, unitary, and Stiefel manifolds, crucial for optimization and neural network architecture.
Its versatile applications in Riemannian optimization, recurrent neural networks, and numerical PDE solvers underpin both theoretical advancements and practical, high-performance computational algorithms.

The scaled Cayley transform consists of a family of matrix or operator-valued rational maps that enable globally well-defined, computationally efficient, and numerically stable parameterizations of Lie groups—such as the orthogonal group, the Stiefel manifold, and the unitary group—by “scaling” or “pivoting” the classical Cayley transform. Unlike the unscaled Cayley chart, which fails at matrices with −1 eigenvalues, the scaled Cayley transform multiplies or twists by an auxiliary diagonal or phase factor to avoid chart singularities. This technique underpins state-of-the-art algorithms in Riemannian optimization, recurrent neural networks, operator theory, and numerical PDEs, ensuring exact preservation of group or manifold constraints while maintaining computational tractability and robust optimization dynamics.

1. Motivation and Classical Limitations of the Cayley Transform

The classical Cayley transform provides a bijective, rational parameterization between real skew-symmetric matrices $A$ and real orthogonal matrices $W = (I+A)^{-1}(I-A)$ , but this correspondence fails when the orthogonal matrix $W$ possesses eigenvalue $-1$ , rendering $I+W$ singular. As $O(n)$ and related manifolds contain elements arbitrarily close to such singularities, relying solely on the unscaled Cayley transform generates gaps in coverage and severe ill-conditioning near $-1$ -eigenvalue regimes (Helfrich et al., 2017, Biborski, 22 Jan 2026). In practical optimization and learning algorithms, this yields instability, unbounded gradient flow, and difficulties in constructing globally valid manifold retractions.

The scaled Cayley transform overcomes these deficiencies by introducing a suitably chosen “scaling” factor that remaps the domain of the transform and regularizes its spectrum, producing an everywhere-well-defined, uniform parameterization for the target group or manifold.

2. Scaled Cayley Transform on $O(n)$ : Construction and Algorithmic Properties

Let $U \in O(n)$ be a real orthogonal matrix. The scaled (“pivoted”) Cayley transform first multiplies $U$ by a diagonal signature matrix $W = (I+A)^{-1}(I-A)$ 0, $W = (I+A)^{-1}(I-A)$ 1, to produce $W = (I+A)^{-1}(I-A)$ 2 with the property that $W = (I+A)^{-1}(I-A)$ 3 is not an eigenvalue. The Cayley transform is then defined as

$W = (I+A)^{-1}(I-A)$ 4

for a skew-symmetric generator $W = (I+A)^{-1}(I-A)$ 5 (Biborski, 22 Jan 2026, Helfrich et al., 2017). Existence and uniqueness of $W = (I+A)^{-1}(I-A)$ 6 for any $W = (I+A)^{-1}(I-A)$ 7 are ensured through a constructive $W = (I+A)^{-1}(I-A)$ 8 Gaussian elimination procedure that, by appropriate choice of each $W = (I+A)^{-1}(I-A)$ 9, keeps the pivots $W$ 0 at every step, avoiding near-singularity.

Quantitative bounds are established: the spectral radius of $W$ 1 satisfies $W$ 2, controlled independently of $W$ 3. The method preserves determinant properties along the Gaussian elimination chain and yields a well-conditioned linear solve for $W$ 4.

This pivoted Cayley representation is readily incorporated in optimization workflows: for gradient steps parametrized in the Lie algebra, one transforms to and from the skew-symmetric generator via the scaled Cayley chart, with pivoting as needed to preserve stability under line-search or trust-region schemes (Biborski, 22 Jan 2026).

3. Scaled Cayley Transform in Riemannian Optimization on the Stiefel Manifold

On the Stiefel manifold $W$ 5, imposing orthonormality constraints in optimization requires projecting updates onto the tangent space and mapping resultant steps back to the manifold. Here, the scaled Cayley transform enables an efficient first-order retraction: $W$ 6 where $W$ 7 and $W$ 8 (2002.01113). Rather than inverting a full matrix for each step (cost $W$ 9), one may implement the retraction by a fixed-point iteration: $-1$ 0 where typically $-1$ 1 or $-1$ 2 iterations suffice for practical accuracy. The map $-1$ 3 is a contraction for $-1$ 4, and after $-1$ 5 steps $-1$ 6 (2002.01113).

Embedding this retraction in stochastic gradient methods (Cayley-SGD, Cayley-ADAM) yields provable convergence guarantees: for $-1$ 7-Lipschitz gradients, $-1$ 8, and $-1$ 9, one obtains

$I+W$ 0

Crucially, the retraction acts as its own (implicit) vector transport: momentum or auxiliary vectors are projected to the tangent space, then advanced by the same Cayley step, obviating explicit parallel transport (2002.01113).

4. Scaled Cayley Parameterizations in Neural Network Architectures

Constrained recurrent neural networks (RNNs), such as scoRNN and scuRNN, employ the scaled Cayley transform to parameterize orthogonal or unitary recurrent matrices. For the real case,

$I+W$ 1

where $I+W$ 2 is a diagonal matrix with entries $I+W$ 3 (Helfrich et al., 2017, Maduranga et al., 2018). This parametrization covers all $I+W$ 4, maintaining exact orthogonality while sidestepping the eigenvalue $I+W$ 5 obstruction inherent to the classical Cayley transform.

For complex unitary RNNs,

$I+W$ 6

with $I+W$ 7 and $I+W$ 8 diagonal unitary. Unlike in the real case, the phases $I+W$ 9 can be optimized through gradient descent, removing the need for discrete hyperparameters or manual signature selection. This approach delivers exact unitarity, robust gradient flow, and state-of-the-art empirical performance for long-range-sequence modeling (Maduranga et al., 2018).

Summary of key properties in these architectures:

Model	Scaling	Parameter Space	Coverage
Classical Cayley	None	$O(n)$ 0	$O(n)$ 1
scoRNN	$O(n)$ 2 diag $O(n)$ 3	$O(n)$ 4	$O(n)$ 5
scuRNN	$O(n)$ 6 unitary diag	$O(n)$ 7	$O(n)$ 8

5. Scaled Cayley Transforms in Operator Theory and Numerical Analysis

In operator-theoretic formulations of wave problems, particularly the Convergent Born Series (CBS) for Helmholtz-type equations, the scaled Cayley transform serves as a contractive rational map for self-adjoint background operators. For $O(n)$ 9 self-adjoint, the scaled (real-shift) Cayley transform is defined by

$-1$ 0

Every $-1$ 1 is mapped to a strict contraction $-1$ 2. The invertibility and contractivity of $-1$ 3 facilitate robust convergence conditions for preconditioned fixed-point iterations, e.g., in Lippmann–Schwinger solvers (Jakobsen, 22 Apr 2026). The resolvent identity $-1$ 4 ties convergence rates and smoothing directly to the spectrum of the scaled Cayley map.

The presence of complex-valued absorbing layers further increases the spectral gap and strengthens contraction without sacrificing the self-adjoint structure of $-1$ 5. Numerical experiments confirm geometric convergence and robustness to heterogeneous strong-contrast media (Jakobsen, 22 Apr 2026).

6. Scaled Cayley Maps for Unitary and Special Unitary Groups

For compact Lie groups beyond $-1$ 6, scaled Cayley maps are constructed to preserve group-specific invariants. In $-1$ 7 dynamics (Schäfers et al., 2024), the modified Cayley map takes the form

$-1$ 8

where $-1$ 9 is chosen to ensure $O(n)$ 0, defined via an explicit algebraic relation involving $O(n)$ 1 and $O(n)$ 2. This map is local diffeomorphism, preserves volume, and is time-reversible, guaranteeing suitability for geometric integrators and Hamiltonian Monte Carlo algorithms. The construction generalizes to $O(n)$ 3 by solving the appropriate phase constraint (Schäfers et al., 2024).

7. Numerical, Algorithmic, and Theoretical Impact

The scaled Cayley transform yields:

Uniformly bounded, well-conditioned parametrizations avoiding chart singularities near $O(n)$ 4 eigenvalues (Biborski, 22 Jan 2026, Helfrich et al., 2017).
Numerically stable $O(n)$ 5 algorithms for inversion and update steps, leveraging pivoting to guarantee robust Schur complements and determinant preservation (Biborski, 22 Jan 2026).
Simple and efficient retractions for Riemannian optimization on matrix manifolds, directly enabling fast, convergent manifold-adapted SGD and Adam variants (2002.01113).
Full coverage of $O(n)$ 6 or $O(n)$ 7 in neural network parameterizations, exact preservation of group-constraints, stable gradients, and improved empirical performance on sequence tasks (Helfrich et al., 2017, Maduranga et al., 2018).
Enhanced contractivity and convergence criteria for iterative solvers in high-frequency and heterogeneous operator regimes (Jakobsen, 22 Apr 2026).
Exact volume-preservation and symmetry properties in structure-preserving numerical integration for gauge theory simulations (Schäfers et al., 2024).

A plausible implication is that the scaled Cayley transform will continue to serve as a foundational tool for geometry-aware numerical algorithms, facilitating both theoretical advances and large-scale machine learning and scientific computing deployments.