Lorentz-Equivariant Geometric Algebra Transformer

Updated 3 December 2025

L-GATr is a neural architecture that combines 4D Minkowski geometric algebra with Lorentz-equivariant operations, ensuring physical symmetry in high-energy physics tasks.
It integrates a scalable Transformer with multivector attention and locally canonicalized frames to support regression, classification, and generative modeling.
Experimental results demonstrate state-of-the-art performance with robust generalization and efficient handling of high token counts in challenging particle collision data.

The Lorentz-Equivariant Geometric Algebra Transformer (L-GATr) is a neural architecture that combines geometric algebra representations of four-dimensional Minkowski space with exactly Lorentz-equivariant operations, embedded within the scalable Transformer backbone. It is designed for learning tasks in high-energy physics, notably regression, classification, and generative modeling of particle collision data, where Lorentz symmetry is fundamental. L-GATr encodes each particle’s state as a multivector in the 16-dimensional Clifford algebra $\mathrm{Cl}(1,3)$ , and defines all neural operations—including linear layers, attention, and normalization—so as to remain equivariant under $\mathrm{SO}^+(1,3)$ transformations, ensuring that predictions properly reflect the underlying physical symmetries of relativistic kinematics (Spinner et al., 23 May 2024, Brehmer et al., 1 Nov 2024, Qureshi et al., 23 Feb 2025, Favaro et al., 20 Aug 2025).

1. Geometric Algebra Foundations and Lorentz Equivariance

L-GATr is built on the Clifford algebra $\mathrm{Cl}(1,3)$ , where any element $x$ may be written as $x = \sum_{k=0}^4 \langle x \rangle_k$ , with grades corresponding to scalars ( $k=0$ ), four-vectors ( $k=1$ ), bivectors ( $k=2$ ), trivectors ( $k=3$ ), and the pseudoscalar ( $k=4$ ). The underlying metric is $\eta = \operatorname{diag}(+1,-1,-1,-1)$ , defining Minkowski space-time. The geometric product, $uv = u \cdot v + u \wedge v$ , supplies the basic operations: $u \cdot v$ as the Lorentz inner product and $u \wedge v$ as the exterior (bivector) part.

Lorentz transformations $\Lambda \in \mathrm{SO}^+(1,3)$ act naturally on grade-1 vectors as $p^\mu \rightarrow \Lambda^\mu_\nu p^\nu$ , extended to the full algebra as a homomorphism, so that $\Lambda \cdot (uv) = (\Lambda \cdot u)(\Lambda \cdot v)$ . For neural network layers $f$ , the strong equivariance constraint $f(\Lambda \cdot x) = \Lambda \cdot f(x)$ holds at every stage. This is enforced mathematically by only permitting “grade-preserving” linear maps: $\varphi(x) = \sum_{k=0}^4 v_k \langle x \rangle_k + \sum_{k=0}^4 w_k e_{0123} \langle x \rangle_k,$ where $v_k, w_k\in\mathbb{R}$ are learnable parameters and $e_{0123}$ denotes the unique grade-4 pseudoscalar (Spinner et al., 23 May 2024, Brehmer et al., 1 Nov 2024, Qureshi et al., 23 Feb 2025).

Nonlinearities are restricted to functions that commute with Lorentz transformations, including the geometric product $GP(x, y) = xy$ , scalar-gated GELU activations $\operatorname{GELU}(\langle x \rangle_0) x$ , and grade-wise LayerNorm: $\operatorname{LayerNorm}(x) = x / \sqrt{ (1/n) \sum_{c=1}^n \sum_{k=0}^4 |\langle \langle x_c \rangle_k, \langle x_c \rangle_k \rangle| + \epsilon }.$

2. Transformer Architecture and Multivector Attention

L-GATr integrates geometric algebra with a multi-head, self-attention Transformer. Each token represents a particle and includes $n$ channels of $\mathrm{Cl}(1,3)$ multivectors (e.g., $n=32$ ), plus $m$ scalar channels (e.g., for type, time, or other metadata). Four-momenta are encoded into the grade-1 part of one multivector channel; categorical tags are placed in scalars.

Attention is carried out by forming queries, keys, and values as multivectors. The attention weights use the Lorentz-invariant inner product,

$\text{score}_{ij} = \langle q_i, k_j \rangle / \sqrt{16 n_c},$

which ensures invariance under group action. The architecture supports stacking of $B$ blocks:

LayerNorm
AttentionBlock: $\operatorname{Linear}(\operatorname{Attention}(\operatorname{Linear}(\bar x), \ldots)) + x$
MLPBlock: includes geometric product and scalar-gated GELU
Residual connection and equivariant linear mixing

All linear transformations adhere to the grade-wise constraints, and attention scales as $\mathcal{O}(N^2)$ due to FlashAttention backend compatibility (Spinner et al., 23 May 2024, Brehmer et al., 1 Nov 2024).

3. Lorentz Local Canonicalization and Generalization

The Lorentz Local Canonicalization (LLoCa) framework extends L-GATr by equipping each token or node with a learnable local Lorentz frame $L_i \in \mathrm{SO}^+(1,3)$ , predicted by a small network (“Frames-Net”). Feature tensors $T_i^{\mu_1\dots\mu_k}$ are canonically mapped to local frames for exact equivariance, and messages are transported as

$m_{j,L_i}^{a_1\dots a_k} = [\rho_m(R_{ij}) m_{j,L_j}]^{a_1\dots a_k}, \quad R_{ij} = L_i L_j^{-1},$

ensuring that both local computations and aggregate updates are Lorentz-invariant. This permits mixing arbitrary tensor ranks within attention and feed-forward layers, and enables "drop-in" equivariant upgrades for any Transformer or GNN backbone with minimal computational (<20%) and parameter (<1%) overhead. Mixed representations—splitting total feature dimension into scalar, vector, and tensor channels—enable optimal speed/performance tradeoff (Favaro et al., 20 Aug 2025).

4. Task-Specific Adaptations: Regression, Classification, and Generation

L-GATr supports a unified pipeline for high-energy physics tasks:

Regression (e.g., matrix element surrogates for $q \bar q \to Z + n g$ ): Input tokens are one per incoming/outgoing particle, plus a global summary token. The model predicts standardized log-amplitudes, with the scalar grade of the global token as output, minimizing MSE loss.
Classification (e.g., top tagging): Events are tokenized as unordered point clouds, with type tags and optionally additional reference vectors to break to subgroups such as $\mathrm{SO}(3)$ . The output head reads out a scalar, using sigmoid/BCE loss.
Generative modeling: L-GATr parameterizes continuous normalizing flows via Riemannian flow matching. Latents are mapped via ODE integration over physically valid manifolds (e.g., $(y_m, y_p, \eta, \phi)$ for each particle) and projected back using the network’s computed vector fields, respecting geometric constraints. Evaluations use negative log-likelihood and classifier two-sample tests, providing robust phase space coverage (Spinner et al., 23 May 2024, Brehmer et al., 1 Nov 2024, Favaro et al., 20 Aug 2025).

5. Symmetry Breaking and Flexibility

While L-GATr is maximally equivariant by construction, physical tasks at the LHC may require breaking to a subgroup (e.g., by supplying beam-direction or time-like reference vectors as special tokens or channels). This reduces symmetry equivariance to that appropriate for the problem’s residual symmetry. Empirically, providing reference objects as part of the architecture, rather than the data, offers superior performance and flexibility. Symmetry breaking, when performed at the input level (such as explicit beam or time features within LLoCa), allows the network to operate effectively in realistic detector environments (Brehmer et al., 1 Nov 2024, Favaro et al., 20 Aug 2025).

6. Experimental Benchmarking and Performance

Across regression, classification, and generative benchmarks, L-GATr consistently matches or outperforms strong baselines:

Regression: Achieves lowest MSE for high-multiplicity QFT surrogate tasks ( $n=4,5$ gluons), with performance robust to dataset size (Spinner et al., 23 May 2024, Brehmer et al., 1 Nov 2024, Favaro et al., 20 Aug 2025).
Classification: Matches or slightly trails top Lorentz-equivariant GNN architectures (LorentzNet, CGENN, PELICAN) in top tagging ( $\mathrm{AUC}=0.9870$ , background rejection $1/\varepsilon_B \approx 2200$ at $\varepsilon_S=0.3$ ), and achieves best performance (AUC $0.9885$) on 10-way JetClass tasks when pretrained (Brehmer et al., 1 Nov 2024, Favaro et al., 20 Aug 2025).
Generative Modeling: L-GATr parameterized flows yield per-marginal distributions that best match ground truth, especially at distribution tails, with lowest negative log-likelihood and classifier AUCs near $0.5$ (chance-level distinguishability) (Spinner et al., 23 May 2024, Brehmer et al., 1 Nov 2024, Favaro et al., 20 Aug 2025).
Computational Scaling: L-GATr incurs a %%%%41 $\eta = \operatorname{diag}(+1,-1,-1,-1)$ 42%%%% overhead versus vanilla Transformers for small particle multiplicities due to grade-preserving operations, but can handle ${O}(10^3)$ tokens, outscaling equivariant graph-based architectures (Spinner et al., 23 May 2024).

Task	L-GATr Result	Baseline Comparison
QFT Regression	Lowest MSE for $n\geq 3$	Beats CGENN/Transformer
Top Tagging	AUC 0.9870, $1/\varepsilon_B\approx 2200$	Matches SOTA equivariant GNNs
JetClass Multiway	AUC 0.9885	Surpasses ParticleNet, ParT
Event Generation	NLL $\approx$ -32.8	Superior to MLP, transformer
Scaling	$N\gtrsim 1000$ tokens	Outscales CGENN, ParT

Ablation studies demonstrate that removing geometric-algebra channels or enforcing symmetry at insufficient granularity significantly degrades performance and data efficiency.

7. Extensions and Theoretical Insights

Recent work integrates L-GATr with LLoCa, enabling per-particle frame prediction and exact equivariant transport at arbitrary tensorial rank, resulting in architectures that combine the flexibility and speed of standard transformers with guaranteed Lorentz symmetry. In this regime, L-GATr, when enhanced with local frames and explicit rotor transport, can achieve an order of magnitude speedup versus specialized architectures while retaining or surpassing SOTA accuracy, and exhibits optimal scaling with both sample size and multiplicity.

A theoretical implication is that building equivariance directly into the architecture obviates the need for networks to "learn" physical symmetries, leading to better generalization and sample efficiency, particularly for high-dimensional, small-sample, or symmetry-bound problems (Favaro et al., 20 Aug 2025).

References

"Lorentz-Equivariant Geometric Algebra Transformers for High-Energy Physics" (Spinner et al., 23 May 2024)
"A Lorentz-Equivariant Transformer for All of the LHC" (Brehmer et al., 1 Nov 2024)
"Probing a Quarkophobic ${\mathbf{W}^\prime$ at the High-Luminosity LHC via Vector Boson Fusion and Lorentz-Equivariant Point Cloud Learning" (Qureshi et al., 23 Feb 2025)
"Lorentz-Equivariance without Limitations" (Favaro et al., 20 Aug 2025)