Papers
Topics
Authors
Recent
Search
2000 character limit reached

Lie-Algebra Attention Mechanism

Updated 24 June 2026
  • Lie-Algebra Attention is an attention mechanism where tokens are group elements from a matrix Lie group, using intrinsic Lie-algebra norms for scoring.
  • It leverages the geometry of the Lie group via principal logarithm maps and a block-weighted Frobenius inner product, ensuring strict invariance and equivariance.
  • Empirical results show that this method matches or outperforms learned MLP kernels across SE(2), SO(3), and Aff(2) tasks with 50–80× fewer parameters.

Lie-Algebra Attention is an attention mechanism in which each token is a bare element gig_i of a matrix Lie group G⊂GL(m,R)G \subset GL(m, \mathbb{R}), with no auxiliary features or external representations attached. Scoring between tokens leverages the canonical geometry of GG, specifically the unique Lie-algebra norm of the relative pose in the group, leading to an intrinsically equivariant and invariant framework for attention that requires neither learned proximity kernels nor representation-theoretic constructions. The approach generalizes attention to arbitrary matrix Lie groups—even non-compact or non-abelian groups such as the affine groups Aff(n)Aff(n) with scale and shear—bypassing earlier restrictions found in irreducible representation (irrep) and surjective exponential map (surjective-exp) based architectures. This methodology is exemplified in "The Token Is a Group Element: On Lie-Algebra Attention over Matrix Lie Groups" (Musialski, 18 Jun 2026).

1. Formal Definitions and Core Construction

Let GG be a matrix Lie group (closed subgroup of GL(m,R)GL(m, \mathbb{R})). Its Lie algebra g\mathfrak{g} is the tangent space at the identity, equipped with the commutator bracket. In Lie-Algebra Attention, the input tokens {gi}i=1N\{g_i\}_{i=1}^N are group elements gi∈Gg_i\in G.

For two tokens ii and G⊂GL(m,R)G \subset GL(m, \mathbb{R})0, form the group-relative pose as G⊂GL(m,R)G \subset GL(m, \mathbb{R})1. On a principal logarithm chart G⊂GL(m,R)G \subset GL(m, \mathbb{R})2, the logarithm map is well-defined and yields G⊂GL(m,R)G \subset GL(m, \mathbb{R})3.

A block-weighted Frobenius inner product on G⊂GL(m,R)G \subset GL(m, \mathbb{R})4 is introduced. Decompose an orthonormal basis of G⊂GL(m,R)G \subset GL(m, \mathbb{R})5 into G⊂GL(m,R)G \subset GL(m, \mathbb{R})6 irreducible blocks (for instance, by the action of G⊂GL(m,R)G \subset GL(m, \mathbb{R})7). Set positive weights G⊂GL(m,R)G \subset GL(m, \mathbb{R})8, and let G⊂GL(m,R)G \subset GL(m, \mathbb{R})9 be the block-diagonal matrix applying each GG0 on its block. The inner product and associated norm are

GG1

for GG2.

The attention score between tokens GG3 and GG4 is then defined as

GG5

where GG6 is a temperature parameter. This serves as a canonical negative squared-norm proximity kernel on GG7.

2. Equivariance and Invariance Properties

The Lie-Algebra Attention mechanism is strictly invariant under the diagonal (left) group action GG8 for any GG9: the relative pose Aff(n)Aff(n)0 and thus Aff(n)Aff(n)1 are unchanged, making Aff(n)Aff(n)2 exactly invariant.

For equivariant token update, any transformation Aff(n)Aff(n)3, with Aff(n)Aff(n)4 independent of left-multiplication of the input, preserves Aff(n)Aff(n)5-equivariance. If Aff(n)Aff(n)6 with Aff(n)Aff(n)7 computed from invariant features, updates preserve equivariance and induce the correct cocycle law: the updated relative pose between Aff(n)Aff(n)8 and Aff(n)Aff(n)9 is GG0, making group composition and chaining consistent by construction.

3. Relationship to Alternative Approaches

<table> <thead> <tr><th>Method</th><th>Token Type</th><th>Applicability to GG1</th></tr> </thead> <tbody> <tr> <td>Irrep-based (e.g., tensor-field networks, SE(3)-Transformer)</td> <td>GG2 with GG3-representation GG4</td> <td>Not applicable (fails for non-compact, non-unitary groups)</td> </tr> <tr> <td>Surjective-exp (LieConv-style)</td> <td>Lift GG5 via log, convolve over GG6</td> <td>Not applicable (exponential not surjective on GG7)</td> </tr> <tr> <td>Frame or point-based (e.g., AlphaFold IPA)</td> <td>Auxiliary local frames + distance kernels</td> <td>Only translation + rotation; not scale/shear</td> </tr> <tr> <td>LieTransformer (Hutchinson et al., 2020)</td> <td>GG8 with learned MLP kernel</td> <td>Not applicable (surjective-exp constraint fails)</td> </tr> <tr> <td>Lie-Algebra Attention</td> <td>Bare GG9</td> <td>Applies to all GL(m,R)GL(m, \mathbb{R})0 with principal log</td> </tr> </tbody> </table>

Lie-Algebra Attention eliminates the need for explicit vector representations, irreps, Clebsch-Gordan products, or learned kernels. The attention kernel is uniquely fixed by the Lie group's geometry and admits closed-form evaluation for all matrix Lie groups where a principal logarithm chart is defined, specifically including full affine groups GL(m,R)GL(m, \mathbb{R})1 where previous approaches do not apply (Musialski, 18 Jun 2026).

4. Empirical Performance and Benchmarks

Lie-Algebra Attention was compared against:

  • A learned MLP kernel (GL(m,R)GL(m, \mathbb{R})2) on the same invariant GL(m,R)GL(m, \mathbb{R})3
  • Classical vector-token attention (GL(m,R)GL(m, \mathbb{R})4) on absolute coordinates

Tasks: Sequence completion (one missing token in GL(m,R)GL(m, \mathbb{R})5 sequences) over GL(m,R)GL(m, \mathbb{R})6, GL(m,R)GL(m, \mathbb{R})7, and GL(m,R)GL(m, \mathbb{R})8. Key performance results:

  • On GL(m,R)GL(m, \mathbb{R})9: Lie-Algebra Attention (g\mathfrak{g}0) used only 36 kernel parameters vs g\mathfrak{g}1's 1,932 (%%%%60GL(m,R)GL(m, \mathbb{R})661%%%% reduction). Test pose RMSE 0.003 (g\mathfrak{g}4) vs 0.005 (g\mathfrak{g}5). Invariance error g\mathfrak{g}6 for g\mathfrak{g}7, g\mathfrak{g}8 for g\mathfrak{g}9 (vector-token).
  • On {gi}i=1N\{g_i\}_{i=1}^N0: {gi}i=1N\{g_i\}_{i=1}^N1 had 24 parameters, {gi}i=1N\{g_i\}_{i=1}^N2 1,932; pose errors {gi}i=1N\{g_i\}_{i=1}^N3 for both; invariance error {gi}i=1N\{g_i\}_{i=1}^N4 ({gi}i=1N\{g_i\}_{i=1}^N5), {gi}i=1N\{g_i\}_{i=1}^N6 ({gi}i=1N\{g_i\}_{i=1}^N7).
  • On {gi}i=1N\{g_i\}_{i=1}^N8: {gi}i=1N\{g_i\}_{i=1}^N9 60 parameters vs gi∈Gg_i\in G0 3,084; pose error gi∈Gg_i\in G1 (G, C, indistinguishable); invariance error gi∈Gg_i\in G2 (gi∈Gg_i\in G3), gi∈Gg_i\in G4 (gi∈Gg_i\in G5).

Closed-form algebra-norm attention matches or outperforms the learned MLP kernel with 50–80gi∈Gg_i\in G6 fewer parameters, holds exact invariance, and operates in group regimes inaccessible to previous attention methods (Musialski, 18 Jun 2026).

5. Computational Complexity and Implementation

For each token pair, gi∈Gg_i\in G7 is computed using closed-form or series expansions (e.g., Rodrigues, Cayley–Hamilton) for block-triangular algebras such as gi∈Gg_i\in G8 or gi∈Gg_i\in G9. The per-pair computational cost is ii0 but is negligible for the practical ii1 encountered.

The block-weighted norm, if ii2 is stored in block coordinates, is ii3.

Softmax normalization proceeds as standard: ii4. Space and time scaling is ii5 in both storage and compute, as with standard attention. The only additional cost is ii6 evaluations of the matrix logarithm and block-norm, which remains ii7 overhead for ii8 (Musialski, 18 Jun 2026).

6. Significance and Extensions

Lie-Algebra Attention demonstrates that the structure of matrix Lie groups admits a closed-form, unique proximity kernel for attention—circumventing kernel learning and representation-theoretical obstacles. Strict invariance and exact cocycle condition are satisfied to machine precision. Potential extensions include generalization to arbitrary principal logarithm charts and handling more complex group actions or associated homogeneous spaces. The methodology absorbs and supersedes previous approaches that used vector- or feature-tensor tokens and enables attention models for regimes (such as full affine symmetry) previously inaccessible, as well as sharp reductions in kernel parameter count without loss of empirical accuracy.

Equivariant self-attention architectures such as the LieTransformer (Hutchinson et al., 2020) use a learned kernel (often an MLP) on a group-invariant or logarithm-based coordinate, still requiring surjective exponential maps and hence are not applicable to all matrix Lie groups (specifically, they are excluded for ii9). The approach in Lie-Algebra Attention achieves full generality for matrix Lie groups with a well-defined principal logarithm, positioning it as a canonical construction in the landscape of group-theoretic attention mechanisms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Lie-Algebra Attention.