Papers
Topics
Authors
Recent
Search
2000 character limit reached

Orthogonal Decomposition of Linear Layers

Updated 15 January 2026
  • Orthogonal decomposition of linear layers is a framework expressing neural network operators via group symmetry using tools like Brauer and rotor decompositions.
  • It employs algebraic structures such as Brauer algebras and Clifford algebras to reduce redundancy, decrease parameter counts, and achieve significant computational speedups.
  • This approach enables differentiable, efficient layer construction that has been practically validated in large language models with competitive performance.

Orthogonal decomposition of linear layers refers to the process of expressing linear operators—such as those used in neural network layers—using bases or factorizations that directly encode the action of orthogonal (or broader symmetry) groups, or that reduce redundancy and parameter count by exploiting orthogonality and group representation theory. Recent research formalizes and algorithmizes such decompositions both from the algebraic group equivariance perspective, as seen in Brauer-algebra approaches, and from Clifford algebraic constructions via rotors, providing efficient representations and compelling empirical performance in large models.

1. Equivariant Linear Maps and Brauer Algebras

Linear layers mapping between tensor power spaces of Rn\mathbb{R}^n can be explicitly characterized when they are equivariant under the action of the orthogonal group O(n)O(n). Given V=RnV = \mathbb{R}^n, the kk-fold tensor power VkV^{\otimes k} admits a natural O(n)O(n)-representation. The vector space of interest is

HomO(n)(Vk,V),\operatorname{Hom}_{O(n)}(V^{\otimes k},V^{\otimes \ell}),

i.e., all linear maps W:VkVW: V^{\otimes k} \to V^{\otimes \ell} commuting with the action induced by O(n)O(n). Schur–Weyl duality establishes that this commutant algebra is isomorphic to the Brauer algebra Bk,(n)B_{k,\ell}(n), and as such, admits a basis indexed by (k,)(k,\ell)-Brauer diagrams, which are perfect matchings on k+k + \ell objects.

A (k,)(k,\ell)-Brauer diagram encodes contractions or identifications among indices in the tensor basis, allowing each equivariant linear map to be represented as a sum

W=βBk,wβEβ,W = \sum_{\beta \in B_{k,\ell}} w_\beta E_\beta,

where EβE_\beta are diagram-induced linear maps that contract (or sum over) specified pairs of indices. The dimension of this commutant space is given by the number of Brauer diagrams:

Bk,=(k+)!(2!)(k+)/2((k+)/2)!|B_{k,\ell}| = \frac{(k+\ell)!}{(2!)^{(k+\ell)/2}\,((k+\ell)/2)!}

for k+k+\ell even, and zero otherwise (Pearce-Crump, 2023).

2. Schur–Weyl Decomposition and Orthogonal Idempotents

The Brauer algebra Bk(n)B_{k}(n) is semisimple for nk1n \geq k-1, enabling the decomposition of the tensor power representation into irreducibles. Specifically,

Vkλk2m,nSλ(Rn)Mλ,V^{\otimes k} \cong \bigoplus_{\lambda \vdash k-2m,\, \ell \leq n} S^\lambda(\mathbb{R}^n) \otimes M_\lambda,

where SλS^\lambda denotes irreducible O(n)O(n)-modules labeled by Young diagram λ\lambda, and MλM_\lambda are spaces of multiplicity. Each isotypic block can be projected onto by a central primitive idempotent PλEndO(n)(Vk)P_\lambda \in \operatorname{End}_{O(n)}(V^{\otimes k}), which can be expressed (but need not be constructed) as signed combinations of EβE_\beta.

This decomposition enables fine-grained analysis of layer structure and highlights the fundamental role of group representation theory in architectural design, with the branching rules and dimension formulae governed by combinatorics of standard Young tableaux (Pearce-Crump, 2023).

3. Fast Algorithms for Equivariant Multiplication

Direct instantiation of O(n)O(n)-equivariant maps as dense n×nkn^\ell \times n^k matrices is computationally prohibitive. By leveraging the Brauer basis structure and planar diagram factorizations, one can perform the matrix–vector product efficiently by decomposing diagrammatic actions via Kronecker products of small matrices.

Key steps in the algorithm involve:

  • Decomposing each Brauer diagram β\beta into permutations and a planar core,
  • Applying sequential permutations to input tensors,
  • Computing "PlanarMult" as a series of tensor contractions or duplications per diagram sector (bottom-row contractions, through-connections, top-row copies),
  • Avoiding ever materializing the large full matrix.

The resulting complexity improves from O(nk+)O(n^{k+\ell}) naively to O(nk1+n1)O(n^{k-1} + n^{\ell-1}), yielding exponential gains for nontrivial k,k, \ell (Pearce-Crump, 2023).

4. Clifford Algebraic Rotor Decomposition

A distinct orthogonal decomposition of linear layers is made possible via Clifford algebras and their associated rotor structures. In this framework, every linear operator on Rd\mathbb{R}^d is viewed through the lens of geometric algebra. The Clifford algebra Cl(d)\mathrm{Cl}(d) encompasses multivectors of all grades, with the grade-2 subspace Cl2(d)\mathrm{Cl}^2(d) encoding all possible bivectors as oriented planes.

A key insight is that the Spin group Spin(d)\operatorname{Spin}(d), comprising all "rotors" (products of even-grade Clifford elements of form r=expCl(b)r = \exp_{\mathrm{Cl}}(b) for bCl2(d)b \in \mathrm{Cl}^2(d)), acts on vectors via the sandwich product xrxrx \mapsto r x r^\dagger, which yields an orthogonal transformation (element of SO(d)\mathrm{SO}(d)) (Pence et al., 15 Jul 2025).

Every orthogonal or general linear transformation can then be expressed as a suitable product or composition of such rotors, each parametrized by bivectors—geometric primitives encoding rotations in two-dimensional planes.

5. Algorithmic Construction and Differentiable Decomposition

Any d×dd \times d linear map can be realized as a (pooled) composition of O(log2d)O(\log^2 d) rotors acting on O(1)O(1)-sized chunks of the input and output vector spaces, identified via disjoint coordinate blocks. For n=log2min(din,dout)n = \lceil \log_2 \min(d_\mathrm{in}, d_\mathrm{out}) \rceil,

  • Input and output are covered by c1c_1 and c2c_2 blocks, each of size 2n2^n.
  • On each block, apply a two-sided sandwich transform using learnable rotors rij,sijr_{ij}, s_{ij} generated from bivectors aij,bijCl2(n)a_{ij}, b_{ij} \in \mathrm{Cl}^2(n).
  • The total parameter count is thus O(log2d)O(\log^2 d), vastly fewer than the O(d2)O(d^2) of a dense layer.

A differentiable algorithm is provided for extracting commuting simple bivector components of a general bivector via a Clifford-algebraic power iteration, supporting efficient and gradient-compatible learning of rotor parameters. All algebraic operations—contraction, normalization, trigonometric functions—are compatible with standard tensor frameworks and automatic differentiation (Pence et al., 15 Jul 2025).

6. Practical Implementations and Empirical Results

Rotor-based linear layers have been deployed as replacements for key, query, and value projections in LLM attention heads by partitioning input vectors and implementing the sandwich operation in parallel over blocks. Training protocols include mean-squared error loss against a frozen teacher model, weight decay on bivectors, and initialization with small random entries.

Empirical validation demonstrates that rotor-based projections achieve comparable or better performance to strong baselines such as low-rank and block-Hadamard layers, but with 2–4 orders of magnitude fewer parameters. For example, in the LLaMA-1B attention Q-head, the rotor configuration needs at most 896 parameters compared to 4.19 million for a fully dense layer, and matches or slightly surpasses baseline test perplexity and accuracy across standard natural language tasks (Pence et al., 15 Jul 2025).

Current rotor implementations incur only a modest runtime slowdown due to the absence of custom geometric algebra kernels. The documented block-sparse structure is expected to enable future optimized inference.

7. Categorical and Representation-Theoretic Foundations

Both the Brauer algebra approach and the rotor decomposition rely fundamentally on the categorical and representation-theoretic underpinnings of equivariant linear maps:

  • Brauer category B(n)B(n) formalizes the morphisms as real-linear combinations of diagrams, with functorial equivalence to the category of O(n)O(n)-equivariant representations, ensuring completeness of the diagrammatic basis.
  • Monoidality corresponds to the Kronecker product structure in layer composition, and semisimplicity guarantees decomposition into irreducibles.
  • In the Clifford algebraic setting, the Lie algebra so(d)\mathfrak{so}(d) of skew-symmetric matrices is isomorphic to Cl2(d)\mathrm{Cl}^2(d), enabling the exponential map to parameterize all O(n)O(n)-equivariant transformations in geometric terms.

This shared theoretical grounding elucidates how group symmetry, algebraic structure, and efficient computation interface in modern neural network layer design (Pearce-Crump, 2023, Pence et al., 15 Jul 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Orthogonal Decomposition of Linear Layers.