Lie-Algebra Attention Mechanism
- Lie-Algebra Attention is an attention mechanism where tokens are group elements from a matrix Lie group, using intrinsic Lie-algebra norms for scoring.
- It leverages the geometry of the Lie group via principal logarithm maps and a block-weighted Frobenius inner product, ensuring strict invariance and equivariance.
- Empirical results show that this method matches or outperforms learned MLP kernels across SE(2), SO(3), and Aff(2) tasks with 50–80× fewer parameters.
Lie-Algebra Attention is an attention mechanism in which each token is a bare element of a matrix Lie group , with no auxiliary features or external representations attached. Scoring between tokens leverages the canonical geometry of , specifically the unique Lie-algebra norm of the relative pose in the group, leading to an intrinsically equivariant and invariant framework for attention that requires neither learned proximity kernels nor representation-theoretic constructions. The approach generalizes attention to arbitrary matrix Lie groups—even non-compact or non-abelian groups such as the affine groups with scale and shear—bypassing earlier restrictions found in irreducible representation (irrep) and surjective exponential map (surjective-exp) based architectures. This methodology is exemplified in "The Token Is a Group Element: On Lie-Algebra Attention over Matrix Lie Groups" (Musialski, 18 Jun 2026).
1. Formal Definitions and Core Construction
Let be a matrix Lie group (closed subgroup of ). Its Lie algebra is the tangent space at the identity, equipped with the commutator bracket. In Lie-Algebra Attention, the input tokens are group elements .
For two tokens and 0, form the group-relative pose as 1. On a principal logarithm chart 2, the logarithm map is well-defined and yields 3.
A block-weighted Frobenius inner product on 4 is introduced. Decompose an orthonormal basis of 5 into 6 irreducible blocks (for instance, by the action of 7). Set positive weights 8, and let 9 be the block-diagonal matrix applying each 0 on its block. The inner product and associated norm are
1
for 2.
The attention score between tokens 3 and 4 is then defined as
5
where 6 is a temperature parameter. This serves as a canonical negative squared-norm proximity kernel on 7.
2. Equivariance and Invariance Properties
The Lie-Algebra Attention mechanism is strictly invariant under the diagonal (left) group action 8 for any 9: the relative pose 0 and thus 1 are unchanged, making 2 exactly invariant.
For equivariant token update, any transformation 3, with 4 independent of left-multiplication of the input, preserves 5-equivariance. If 6 with 7 computed from invariant features, updates preserve equivariance and induce the correct cocycle law: the updated relative pose between 8 and 9 is 0, making group composition and chaining consistent by construction.
3. Relationship to Alternative Approaches
<table> <thead> <tr><th>Method</th><th>Token Type</th><th>Applicability to 1</th></tr> </thead> <tbody> <tr> <td>Irrep-based (e.g., tensor-field networks, SE(3)-Transformer)</td> <td>2 with 3-representation 4</td> <td>Not applicable (fails for non-compact, non-unitary groups)</td> </tr> <tr> <td>Surjective-exp (LieConv-style)</td> <td>Lift 5 via log, convolve over 6</td> <td>Not applicable (exponential not surjective on 7)</td> </tr> <tr> <td>Frame or point-based (e.g., AlphaFold IPA)</td> <td>Auxiliary local frames + distance kernels</td> <td>Only translation + rotation; not scale/shear</td> </tr> <tr> <td>LieTransformer (Hutchinson et al., 2020)</td> <td>8 with learned MLP kernel</td> <td>Not applicable (surjective-exp constraint fails)</td> </tr> <tr> <td>Lie-Algebra Attention</td> <td>Bare 9</td> <td>Applies to all 0 with principal log</td> </tr> </tbody> </table>
Lie-Algebra Attention eliminates the need for explicit vector representations, irreps, Clebsch-Gordan products, or learned kernels. The attention kernel is uniquely fixed by the Lie group's geometry and admits closed-form evaluation for all matrix Lie groups where a principal logarithm chart is defined, specifically including full affine groups 1 where previous approaches do not apply (Musialski, 18 Jun 2026).
4. Empirical Performance and Benchmarks
Lie-Algebra Attention was compared against:
- A learned MLP kernel (2) on the same invariant 3
- Classical vector-token attention (4) on absolute coordinates
Tasks: Sequence completion (one missing token in 5 sequences) over 6, 7, and 8. Key performance results:
- On 9: Lie-Algebra Attention (0) used only 36 kernel parameters vs 1's 1,932 (%%%%60661%%%% reduction). Test pose RMSE 0.003 (4) vs 0.005 (5). Invariance error 6 for 7, 8 for 9 (vector-token).
- On 0: 1 had 24 parameters, 2 1,932; pose errors 3 for both; invariance error 4 (5), 6 (7).
- On 8: 9 60 parameters vs 0 3,084; pose error 1 (G, C, indistinguishable); invariance error 2 (3), 4 (5).
Closed-form algebra-norm attention matches or outperforms the learned MLP kernel with 50–806 fewer parameters, holds exact invariance, and operates in group regimes inaccessible to previous attention methods (Musialski, 18 Jun 2026).
5. Computational Complexity and Implementation
For each token pair, 7 is computed using closed-form or series expansions (e.g., Rodrigues, Cayley–Hamilton) for block-triangular algebras such as 8 or 9. The per-pair computational cost is 0 but is negligible for the practical 1 encountered.
The block-weighted norm, if 2 is stored in block coordinates, is 3.
Softmax normalization proceeds as standard: 4. Space and time scaling is 5 in both storage and compute, as with standard attention. The only additional cost is 6 evaluations of the matrix logarithm and block-norm, which remains 7 overhead for 8 (Musialski, 18 Jun 2026).
6. Significance and Extensions
Lie-Algebra Attention demonstrates that the structure of matrix Lie groups admits a closed-form, unique proximity kernel for attention—circumventing kernel learning and representation-theoretical obstacles. Strict invariance and exact cocycle condition are satisfied to machine precision. Potential extensions include generalization to arbitrary principal logarithm charts and handling more complex group actions or associated homogeneous spaces. The methodology absorbs and supersedes previous approaches that used vector- or feature-tensor tokens and enables attention models for regimes (such as full affine symmetry) previously inaccessible, as well as sharp reductions in kernel parameter count without loss of empirical accuracy.
Equivariant self-attention architectures such as the LieTransformer (Hutchinson et al., 2020) use a learned kernel (often an MLP) on a group-invariant or logarithm-based coordinate, still requiring surjective exponential maps and hence are not applicable to all matrix Lie groups (specifically, they are excluded for 9). The approach in Lie-Algebra Attention achieves full generality for matrix Lie groups with a well-defined principal logarithm, positioning it as a canonical construction in the landscape of group-theoretic attention mechanisms.