Cayley Transform in Matrix Lie Groups
- Cayley transform is a rational map converting skew-symmetric (or skew-Hermitian) matrices into orthogonal (or unitary) groups, preserving essential algebraic properties.
- It enables efficient retractions on matrix manifolds, facilitating optimization in deep learning, numerical analysis, and operator theory applications.
- Generalized variants, like the scaled Cayley transform, overcome domain issues and extend applications to representation theory, differential equations, and quantum computations.
The Cayley transform is a rational, birational, or diffeomorphic map between certain matrix Lie algebras and groups, classically mapping skew-symmetric (or skew-Hermitian) matrices to orthogonal (or unitary) matrices. Its algebraic, analytic, and computational properties allow for efficient parameterizations, optimizations, and structure-preserving numerical methods across mathematical physics, representation theory, optimization on matrix manifolds, deep learning, and operator theory.
1. Definition and Foundational Properties
Let or . For a skew-symmetric () or skew-Hermitian () matrix , the Cayley transform is defined as
provided is invertible. The transform maps skew-generators into the orthogonal (if real) or unitary (if complex) group:
- ,
- .
The inverse Cayley transform recovers (when ) via
where . The Cayley transform is a local diffeomorphism around and , inducing a birational isomorphism between a Zariski open neighborhood of $0$ and within the Lie algebra and the Lie group, respectively (Lu et al., 2024).
The domain of definition excludes points where is singular, i.e., when has as an eigenvalue. This domain issue is fundamental when representing arbitrary orthogonal or unitary matrices.
2. Generalizations and Domain Extensions
Scaled and Pivoted Cayley Transforms
The classical Cayley transform cannot represent matrices with eigenvalue , as becomes singular. The scaled Cayley transform introduces a diagonal signature matrix and reparametrizes as
for , . By altering , every can be represented—even those with eigenvalues (Helfrich et al., 2017).
A recent constructive approach (Biborski, 22 Jan 2026) gives an algorithm to select ensuring for any , yielding a bounded skew-symmetric satisfying
exactly representing all real orthogonal matrices. The approach provides quantitative spectral bounds for .
The scaled Cayley idea also generalizes to the complex case, with for , leading to a fully parameterized unitary group with all phases learnable in machine learning contexts (Maduranga et al., 2018).
Further Extensions
- Cayley on Quadratic Groups: For broader quadratic Lie groups , the Cayley map preserves , mapping to (Maslovskaya et al., 2024, Schäfers et al., 2024).
- Representation-Theoretic Generalization: The Cayley transform applies to representations provided that is closed under the "power-span" property and associated symmetries (Lu et al., 2024).
3. Applications in Optimization and Geometry
Matrix Manifolds
The Cayley transform implements computationally efficient retractions and coordinate charts for orthogonality- or unitarity-constrained optimization:
- Stiefel Manifold: The generalized Cayley parameterization provides an (almost) global chart of , transferring the optimization problem to an unconstrained Euclidean domain (Kume et al., 2023, Jauch et al., 2018).
- ALCP Scheme: Adaptive locational re-centering mitigates slow convergence near chart singularities, outperforming classical QR- or retraction-based manifold optimizers with negligible overhead (Kume et al., 2023).
- Iterative Cayley Retraction: An efficient iterative solver enables Riemannian optimization (SGD, Adam) enforcing exact orthogonality, with global convergence, and significant empirical speedup and stability for deep CNN/RNN training (2002.01113).
Inverse Eigenvalue Problems
The Cayley transform is employed as a retraction in globally convergent Newton-like schemes for symmetric inverse eigenvalue problems, providing exact structure preservation (orthogonality) and avoiding the cost of explicit QR or SVD reorthogonalizations (Ling et al., 2013).
4. Function Theory, Probability, and Operator Theory
- Random Matrix Theory: The Cayley transform parametrizes the Stiefel and Grassmann manifolds, enabling explicit Jacobians for change-of-variables in MCMC and providing asymptotically independent normal approximations for uniform measures via the Euclidean parameters (Jauch et al., 2018).
- Quaternionic and Graded Operator Theory: The Cayley transform extends to quaternionic Hilbert spaces, tying the Cayley spectrum to the S-spectrum and underpinning deficiency index theory akin to the complex setting (Muraleetharan et al., 2017).
- -theory: The Cayley transform yields explicit isomorphisms between van Daele -theory and -theory, compatible with Kasparov products, index pairings, and real/grading structures, providing cycle-level representatives essential for analytical and physical applications (Bourne et al., 2019).
5. Structure-Preserving Numerical Analysis and Differential Equations
The Cayley transform forms the backbone of commutator-free Lie group integrators for non-autonomous differential equations on quadratic matrix groups, preserving physical and geometrical invariants:
- Commutator-Free Integrators: High-order time-propagators composed of Cayley transforms achieve fourth-order global accuracy while guaranteeing exact group structure preservation, circumventing the cost and complexity of matrix exponentials and nested commutators (Maslovskaya et al., 2024).
- Modified Cayley for : Implementing rational update steps that locally parameterize , the modified Cayley transform ensures reversibility and volume preservation in molecular dynamics and lattice QCD, and achieves superior empirical efficiency in second-order splitting integrators, though is only first-order per link update (Schäfers et al., 2024).
6. Deep Learning and Neural Networks
- Unitary and Orthogonal RNNs: The Cayley transform (and its scaled and complex-scaled variants) parameterizes orthogonal/unitary recurrent weight matrices, preserving gradient norms exactly (avoiding vanishing/exploding gradients), with direct closed-form gradient updates tied to unconstrained skew-symmetric/Hermitian parameters. The discrete structure of "scaling matrices" is handled as a hyperparameter or optimized in the complex domain (Helfrich et al., 2017, Maduranga et al., 2018).
- Orthogonal Convolutional Layers: Exact orthogonality is imposed on convolutional layers by implementing the Cayley transform in the Fourier domain, ensuring Lipschitz-$1$ operator norm, enhancing adversarial robustness, and enabling networks to scale to large architectures with guaranteed gradient norm preservation (Trockman et al., 2021).
7. Algebraic Interactions, Representation Theory, and Tensor Structure
- Kronecker Product: The Cayley transform of being a Kronecker product reflects deep algebraic structure and only occurs under special conditions on the spectra of and . Complete criteria exist for when (Hardy et al., 2013).
- Representation-Theoretic Classifications: The Cayley transform applies to certain representations of Lie groups, precisely those whose highest weights lie in a single Weyl group orbit (possibly plus the origin), leading to new characterizations for classical and exceptional simple Lie groups, notably identifying triality in (Lu et al., 2024).
The modern landscape of the Cayley transform thus encompasses parametrization of matrix groups and manifolds, efficient algorithms for optimization and sampling, structure-preserving numerical methods in differential equations, deep learning architectures, spectral operator theory, -theory, and representation theory. Its computational variants, domain extensions (via scaling or pivoting), and analytic properties serve as a central toolkit in contexts where preservation of group structure, stability, and geometric fidelity are necessary.