Cayley–Neumann Parameterization

Updated 7 July 2025

Cayley–Neumann parameterization is a mathematical framework that uses the Cayley transform with truncated Neumann series to efficiently parameterize orthogonal and unitary matrices.
It leverages series approximations to bypass costly matrix inversions, ensuring rapid computations and enhanced numerical stability in large-scale applications.
The approach extends to generalized settings like the Stiefel manifold and stability-constrained systems, enabling robust optimization in machine learning, control, and physics.

The Cayley–Neumann parameterization is a family of mathematical constructions that enable efficient, structured, and often unconstrained parameterizations of matrix objects—most notably, orthogonal matrices and related transformation groups—by leveraging the Cayley transform and Neumann series expansions. Developed and refined across areas such as algebraic geometry, numerical optimization, machine learning, and computational physics, Cayley–Neumann parameterizations facilitate explicit rational or polynomial mappings, efficient computation for large-scale systems, and regularity properties that underpin both theoretical analysis and practical applications.

1. Fundamental Principles and Mathematical Framework

The classical Cayley transform provides a rational parameterization of the set of orthogonal (or unitary) matrices, mapping skew-symmetric (or skew-Hermitian) matrices to the group via

$R = (I + Q)(I - Q)^{-1},$

where $Q$ is skew-symmetric ( $Q = -Q^\top$ ) and $||Q|| < 1$ ensures convergence for Neumann series approximations. The Cayley–Neumann parameterization extends this approach by jointly using the Cayley transform and approximating the matrix inverse by a truncated Neumann series,

$(I - Q)^{-1} \approx I + Q + Q^2 + \ldots + Q^k,$

thereby yielding

$R \approx (I + Q)(I + Q + Q^2 + \ldots + Q^k).$

This truncation avoids the computational burden and instability of direct inverses, enabling efficient orthogonal updates even at very large scales (Qiu et al., 24 Jun 2025, Mucllari et al., 2022).

Further generalizations apply the core Cayley principle to spaces beyond orthogonal groups, such as the Stiefel manifold (matrices with orthonormal columns), via so-called generalized left-localized Cayley transforms (Kume et al., 2023), and even to small-gain or stability-constrained systems via reparameterizations that map unit-ball (norm-constrained) sets to the full space using a Cayley-type mapping (Kon et al., 18 Jan 2024).

2. Explicit Construction and Parameterization Procedures

Cayley–Neumann parameterizations are implemented by the following procedures, with adaptations based on context:

Orthogonal/Unitary Matrices (Classical Case):

Parameterize the skew-symmetric matrix $Q$ with trainable entries.

Compute the orthogonal transform with a truncated Neumann series inverse:

1
2
3

def cayley_neumann(Q, k=2):
    neumann = sum(Q**i for i in range(k + 1))
    return (np.eye(Q.shape[0]) + Q) @ neumann

For large-scale applications, only low-order truncations (e.g., $k=2$ ) are typically needed, especially when $Q$ is initialized near zero (Mucllari et al., 2022, Qiu et al., 24 Jun 2025).

Stiefel Manifold (Generalized Case):

Given a center $S \in O(N)$ , partition $S = [S_{\text{le}} | S_{\text{ri}}]$ .
Define explicit mappings:

$A_S(U) = 2 (I_p + S_{\text{le}}^\top U)^{-T} (U^\top S_{\text{le}}) (I_p + S_{\text{le}}^\top U)^{-1},$

$B_S(U) = -S_{\text{ri}}^\top U (I_p + S_{\text{le}}^\top U)^{-1},$

and recover $U$ using an explicit inverse mapping (Kume et al., 2023).

Quadratically Stable System Models:

Reformulate small-gain or unit-ball constraints on system coefficients.
Use a Cayley transformation to map free parameters to coefficient functions that satisfy the contraction (stability) constraint for all parameter values (Kon et al., 18 Jan 2024).

3. Applications in Machine Learning and Optimization

Efficient Orthogonal Finetuning in Foundation Models:

The Cayley–Neumann parameterization has been leveraged to enable scalable, parameter-efficient, and robust adaptation of large models without catastrophic forgetting. In orthogonal finetuning (OFTv2), the key insight is to avoid direct inversion of (I − Q), substituting the truncated Neumann series. This reduces cubic-time matrix–matrix operations (impractical for large transformer models) to quadratic-time matrix–vector operations. The resulting approach achieves up to 10× faster training and substantially lower memory usage, while maintaining adaptation performance and dynamic range (Qiu et al., 24 Jun 2025).

Orthogonal and Unitary Recurrent Networks:

In sequential modeling, the Cayley–Neumann parameterization enforces strict orthogonality on recurrent weight matrices (e.g., in NC-GRU). This ensures spectral norm preservation, prevents exploding gradients, and supports longer memory retention. Empirically, such models outperform standard GRUs and LSTMs, exhibit faster convergence, and exhibit greater training stability—particularly when the truncated Neumann series is applied to keep the update precise yet efficient (Mucllari et al., 2022).

Lipschitz-Constrained Neural Networks:

The Cayley transform is also used in conjunction with controllability Gramian parameterizations to ensure neural networks are globally Lipschitz-bounded. This strengthens robustness to adversarial perturbations without requiring explicit constrained optimization, and allows unconstrained gradient-based training (Pauli et al., 2023).

Stability-Constrained System Identification:

For parameter identification in quadratically stable linear parameter-varying models, Cayley–Neumann parameterizations convert norm-bounded coefficient constraints to free parameter spaces, bypassing costly LMI projections. This enables the use of neural networks as unconstrained coefficient function approximators while rigorously guaranteeing global stability for all parameter settings (Kon et al., 18 Jan 2024).

4. Generalizations and Extensions

Generalized Cayley Transforms for Matrix Manifolds:

Advances in the field have led to generalized, left-localized Cayley transforms that parameterize the entire Stiefel manifold (matrices with orthonormal columns) rather than just square orthogonal matrices. These transformations, formulated in (Kume et al., 2023), permit unconstrained optimization algorithms to be directly applied to problems with orthogonality constraints, enabling computationally efficient global convergence and straightforward algorithmic design.

Modified Cayley Transforms for Lie Groups (e.g., SU(3)):

Extensions to special unitary groups, such as SU(3) used in lattice QCD simulations, require further modifications. A phase compensation parameter is introduced in the Cayley transform to ensure outputs remain within SU(3), maintaining determinant constraints and supporting efficient, locally defined integrators in molecular dynamics (Schäfers et al., 17 Jun 2024).

5. Comparative Analysis and Practical Considerations

Efficiency and Stability:

The primary advantage of Cayley–Neumann parameterization is computational efficiency—particularly for large matrix dimensions—since it largely eliminates expensive matrix inversions and enables matrix-free implementations. This is crucial in LLMs, recurrent nets with high-dimensional hidden states, and stability-constrained control problems, where direct inverses are prohibitive (Qiu et al., 24 Jun 2025, Mucllari et al., 2022).

Numerical Robustness:

Truncated Neumann series approximations are numerically stable when the skew-symmetric generator is near zero, a condition naturally met in incremental finetuning or when models begin close to the identity. This also regularizes gradient flow, mitigating issues such as gradient explosion or parameter drift (Mucllari et al., 2022).

Limitations and Domain-Specific Concerns:

A major limitation is the local validity of truncated expansions; if the generator norm grows too large, orthogonality may be lost or parameterizations become inaccurate. Certain extensions (e.g., for SU(3) (Schäfers et al., 17 Jun 2024)) require nontrivial auxiliary computations (such as phase corrections) to enforce all group constraints, and high-order integrators may need special care to avoid order reduction.

Comparison to Alternative Parameterizations:

Low-rank adaptation (e.g., QLoRA) is an alternative in massive model adaptation, but may not preserve the dynamic range or orthogonality of weights upon blending with quantized base models. The Cayley–Neumann method maintains orthogonality—supporting efficient and robust weight merging and stable adaptation (Qiu et al., 24 Jun 2025).

6. Broader Impact and Future Directions

Cayley–Neumann parameterizations offer a unifying principle spanning geometry, combinatorics, dynamics, and large-scale computational learning. Their algebraic underpinnings afford insight into character varieties (e.g., surface group representations (Kabaya, 2011)), and their computational properties facilitate both scalable deep learning and precise system identification. There are ongoing efforts to further generalize such parameterizations to broader classes of matrix manifolds, develop analytic error bounds for truncated series, and integrate these principles into increasingly sophisticated hybrid optimization and sampling algorithms.

7. Tabular Summary of Recent Application Domains

Domain/Task	Key Usage of Cayley–Neumann Parameterization	Reference
Orthogonal Finetuning of Foundation Models	Scalable, stable, efficient orthogonal updates (truncated Neumann)	(Qiu et al., 24 Jun 2025)
Recurrent Neural Networks (NC-GRU)	Orthogonal recurrent matrix updates for gradient control	(Mucllari et al., 2022)
Stability-Constrained System Identification	Unconstrained stable coefficient parameterization in LPV models	(Kon et al., 18 Jan 2024)
Optimization on Matrix Manifolds	Generalized transforms for unconstrained optimization	(Kume et al., 2023)
SU(3) Molecular Dynamics	Modified transform for efficient, group-preserving integrators	(Schäfers et al., 17 Jun 2024)

In summary, the Cayley–Neumann parameterization represents a versatile, theoretically grounded, and computationally effective framework for parameterizing, optimizing, and analyzing systems that require orthogonality, stability, or group-structured transformations, with rapidly broadening impact across mathematics, physics, and data-driven fields.