Orthogonal Decomposition of Linear Layers
- Orthogonal decomposition of linear layers is a framework expressing neural network operators via group symmetry using tools like Brauer and rotor decompositions.
- It employs algebraic structures such as Brauer algebras and Clifford algebras to reduce redundancy, decrease parameter counts, and achieve significant computational speedups.
- This approach enables differentiable, efficient layer construction that has been practically validated in large language models with competitive performance.
Orthogonal decomposition of linear layers refers to the process of expressing linear operators—such as those used in neural network layers—using bases or factorizations that directly encode the action of orthogonal (or broader symmetry) groups, or that reduce redundancy and parameter count by exploiting orthogonality and group representation theory. Recent research formalizes and algorithmizes such decompositions both from the algebraic group equivariance perspective, as seen in Brauer-algebra approaches, and from Clifford algebraic constructions via rotors, providing efficient representations and compelling empirical performance in large models.
1. Equivariant Linear Maps and Brauer Algebras
Linear layers mapping between tensor power spaces of can be explicitly characterized when they are equivariant under the action of the orthogonal group . Given , the -fold tensor power admits a natural -representation. The vector space of interest is
i.e., all linear maps commuting with the action induced by . Schur–Weyl duality establishes that this commutant algebra is isomorphic to the Brauer algebra , and as such, admits a basis indexed by -Brauer diagrams, which are perfect matchings on objects.
A -Brauer diagram encodes contractions or identifications among indices in the tensor basis, allowing each equivariant linear map to be represented as a sum
where are diagram-induced linear maps that contract (or sum over) specified pairs of indices. The dimension of this commutant space is given by the number of Brauer diagrams:
for even, and zero otherwise (Pearce-Crump, 2023).
2. Schur–Weyl Decomposition and Orthogonal Idempotents
The Brauer algebra is semisimple for , enabling the decomposition of the tensor power representation into irreducibles. Specifically,
where denotes irreducible -modules labeled by Young diagram , and are spaces of multiplicity. Each isotypic block can be projected onto by a central primitive idempotent , which can be expressed (but need not be constructed) as signed combinations of .
This decomposition enables fine-grained analysis of layer structure and highlights the fundamental role of group representation theory in architectural design, with the branching rules and dimension formulae governed by combinatorics of standard Young tableaux (Pearce-Crump, 2023).
3. Fast Algorithms for Equivariant Multiplication
Direct instantiation of -equivariant maps as dense matrices is computationally prohibitive. By leveraging the Brauer basis structure and planar diagram factorizations, one can perform the matrix–vector product efficiently by decomposing diagrammatic actions via Kronecker products of small matrices.
Key steps in the algorithm involve:
- Decomposing each Brauer diagram into permutations and a planar core,
- Applying sequential permutations to input tensors,
- Computing "PlanarMult" as a series of tensor contractions or duplications per diagram sector (bottom-row contractions, through-connections, top-row copies),
- Avoiding ever materializing the large full matrix.
The resulting complexity improves from naively to , yielding exponential gains for nontrivial (Pearce-Crump, 2023).
4. Clifford Algebraic Rotor Decomposition
A distinct orthogonal decomposition of linear layers is made possible via Clifford algebras and their associated rotor structures. In this framework, every linear operator on is viewed through the lens of geometric algebra. The Clifford algebra encompasses multivectors of all grades, with the grade-2 subspace encoding all possible bivectors as oriented planes.
A key insight is that the Spin group , comprising all "rotors" (products of even-grade Clifford elements of form for ), acts on vectors via the sandwich product , which yields an orthogonal transformation (element of ) (Pence et al., 15 Jul 2025).
Every orthogonal or general linear transformation can then be expressed as a suitable product or composition of such rotors, each parametrized by bivectors—geometric primitives encoding rotations in two-dimensional planes.
5. Algorithmic Construction and Differentiable Decomposition
Any linear map can be realized as a (pooled) composition of rotors acting on -sized chunks of the input and output vector spaces, identified via disjoint coordinate blocks. For ,
- Input and output are covered by and blocks, each of size .
- On each block, apply a two-sided sandwich transform using learnable rotors generated from bivectors .
- The total parameter count is thus , vastly fewer than the of a dense layer.
A differentiable algorithm is provided for extracting commuting simple bivector components of a general bivector via a Clifford-algebraic power iteration, supporting efficient and gradient-compatible learning of rotor parameters. All algebraic operations—contraction, normalization, trigonometric functions—are compatible with standard tensor frameworks and automatic differentiation (Pence et al., 15 Jul 2025).
6. Practical Implementations and Empirical Results
Rotor-based linear layers have been deployed as replacements for key, query, and value projections in LLM attention heads by partitioning input vectors and implementing the sandwich operation in parallel over blocks. Training protocols include mean-squared error loss against a frozen teacher model, weight decay on bivectors, and initialization with small random entries.
Empirical validation demonstrates that rotor-based projections achieve comparable or better performance to strong baselines such as low-rank and block-Hadamard layers, but with 2–4 orders of magnitude fewer parameters. For example, in the LLaMA-1B attention Q-head, the rotor configuration needs at most 896 parameters compared to 4.19 million for a fully dense layer, and matches or slightly surpasses baseline test perplexity and accuracy across standard natural language tasks (Pence et al., 15 Jul 2025).
Current rotor implementations incur only a modest runtime slowdown due to the absence of custom geometric algebra kernels. The documented block-sparse structure is expected to enable future optimized inference.
7. Categorical and Representation-Theoretic Foundations
Both the Brauer algebra approach and the rotor decomposition rely fundamentally on the categorical and representation-theoretic underpinnings of equivariant linear maps:
- Brauer category formalizes the morphisms as real-linear combinations of diagrams, with functorial equivalence to the category of -equivariant representations, ensuring completeness of the diagrammatic basis.
- Monoidality corresponds to the Kronecker product structure in layer composition, and semisimplicity guarantees decomposition into irreducibles.
- In the Clifford algebraic setting, the Lie algebra of skew-symmetric matrices is isomorphic to , enabling the exponential map to parameterize all -equivariant transformations in geometric terms.
This shared theoretical grounding elucidates how group symmetry, algebraic structure, and efficient computation interface in modern neural network layer design (Pearce-Crump, 2023, Pence et al., 15 Jul 2025).