LLoCa-Transformer: Exact Lorentz Equivariance

Updated 3 December 2025

LLoCa-Transformer is a neural architecture that provides exact Lorentz equivariance by canonicalizing features into learned local reference frames.
It integrates standard neural layers with equivariant frame prediction and tensorial message passing to efficiently handle space-time tensor data.
Empirical results demonstrate improved performance and computational efficiency on jet tagging and QFT amplitude regression benchmarks.

The LLoCa-Transformer is a generic neural architecture designed to endow any backbone—such as transformers and graph neural networks—with exact Lorentz equivariance. Built on the Lorentz Local Canonicalization (LLoCa) framework, it operates by learning Lorentz-equivariant local reference frames for each entity (e.g., particle) within the input, canonicalizing features into these local frames, and then leveraging standard neural layers. This approach enables seamless propagation of space-time tensorial information while eliminating the architectural constraints of previous Lorentz-equivariant models, and achieves state-of-the-art accuracy and computational efficiency on challenging high-energy physics benchmarks (Spinner et al., 26 May 2025, Favaro et al., 20 Aug 2025).

1. Theoretical Foundations of LLoCa-Transformer

Lorentz symmetry underpins fundamental interactions in high-energy physics, where observed data such as four-momenta $p = (E, \vec{p})$ transform under the proper orthochronous Lorentz group $\mathrm{SO}^+(1,3)$ . Traditional Lorentz-equivariant neural networks deploy bespoke convolution or message-passing layers, severely limiting flexibility. LLoCa overcomes this limitation by decoupling equivariance from architectural constraints.

The central construct is the prediction, for each input object $i$ , of an equivariant local reference frame $L_i \in \mathrm{SO}^+(1,3)$ such that under a global Lorentz transformation $\Lambda$ ,

$L_i \to L'_i = L_i \Lambda^{-1}$

and any subsequent canonicalization into the local frame transforms as

$x_{i,L} = L_i x_i \implies x_{i,L}' = L_i' \Lambda x_i = x_{i,L}$

rendering $x_{i,L}$ exactly invariant under $\Lambda$ .

By expressing all physics features in these locally canonicalized variables before feeding them through arbitrary neural backbone layers, the output can finally be de-canonicalized to restore the correct equivariant transformation law. This construction guarantees exact equivariance at negligible overhead and removes the requirement for special Lorentz layers (Spinner et al., 26 May 2025).

2. Architecture and Algorithmic Structure

The LLoCa-transformer comprises four operational components:

Equivariant Frame Prediction: For $N$ particles with input features (typically four-momenta and optional scalars), a Frames-Net predicts three four-vectors per object:

$v_{i,k} = \sum_{j=1}^N \mathrm{softmax}_j\left[\varphi_k(s_i, s_j, \langle p_i, p_j \rangle)\right](p_i + p_j), \quad k \in \{0,1,2\}$

Here, $\varphi_k$ is a small MLP on Lorentz scalars, and $\langle p, q \rangle$ denotes the Minkowski product.

From $\{v_{i,0}, v_{i,1}, v_{i,2}\}$ , $L_i$ is constructed via a polar decomposition: $B_i$ (a boost from $v_{i,0}$ ), Gram–Schmidt for $R_i$ using $B_i v_{i,1}$ and $B_i v_{i,2}$ , yielding $L_i = R_i B_i$ .

Canonicalization: For any feature $x$ , canonicalization is performed as:

$x_{L_i} = L_i x$

For tensorial objects, the appropriate group representation $\rho$ is used:

$f_{L_i}^{\mu_1 \cdots \mu_n} = (\rho(L_i) f)^{\mu_1 \cdots \mu_n}$

This ensures all features processed by the transformer are Lorentz-invariant.

Standard Transformer Stack on Canonicalized Features: The canonicalized inputs $f_{L_i}$ enter an unmodified transformer or neural backbone, with linear key/query/value projections, multi-head self-attention, feed-forward layers, and residual norms. Equivariance is maintained throughout, as all operations act on Lorentz-invariant quantities.
Tensorial Message Passing: Attention and message aggregation across local frames are conducted via the inter-frame transformation:

$f_{L_i}' = \sum_{j=1}^N \exp \left[\frac{1}{\sqrt{d}} \langle q_{L_i}, \rho(L_i L_j^{-1}) k_{L_j} \rangle \right] \rho(L_i L_j^{-1}) v_{L_j}$

Here, all contractions use the Minkowski product. The general tensorial message-passing update is

$f_{L_i}^{\text{new}} = \psi\left(f_{L_i}, \bigoplus_{j=1}^N \phi(\rho(L_i L_j^{-1}) m_{j, L_j})\right)$

for arbitrary maps $\phi, \psi$ and permutation-invariant sum $\bigoplus$ .

3. Data Augmentation and the Local/Global Frame Dichotomy

By fixing $L_i \equiv \Lambda_{\rm aug}$ (a random global Lorentz transformation) for all $i$ within an event, canonicalization reduces to traditional Lorentz data augmentation—preprocessing each event by a global transform before processing by an ordinary non-equivariant network: $x_{L_i} = \Lambda_{\rm aug} x_i$ Thus, standard augmentation becomes a particular instance of LLoCa, while learning distinct $L_i$ for each object yields exact equivariance. This perspective unifies data augmentation and equivariant preprocessing (Spinner et al., 26 May 2025).

4. Empirical Performance and Ablations

LLoCa-transformers provide substantial performance gains on multiple LHC-relevant tasks:

Task	Baseline	LLoCa-Transformer	Specialized Lorentz-GNN (L-GATr)
Jet tagging (JetClass)	85.5% (AUC 0.9867)	86.4% (AUC 0.9882)	Similar AUC, but 4× slower
QFT amp. regression	MSE $11.9 \times 10^{-6}$ (vanilla)	MSE $1.5 \pm 0.1 \times 10^{-6}$	MSE $2.5 \pm 0.2 \times 10^{-6}$

Notable ablations and findings include:

Tensorial message passing (combining scalar and vector features) is critical; all-scalar attention results in >30× higher regression MSE.
Using the Minkowski metric within attention rather than Euclidean halves MSE.
Frame-Net capacity requirements are low; small MLPs with dropout suffice.
For large datasets, exact equivariance outperforms even optimized data augmentation; for very small samples, data augmentation may marginally outperform due to inductive bias.

Computationally, LLoCa-transformers achieve state-of-the-art with only $10\%$ – $30\%$ extra FLOPs and $30$– $110\%$ extra training time, yet remain up to $4\times$ faster and $5$– $100\times$ more efficient in forward FLOPs compared to previous state-of-the-art Lorentz-equivariant architectures (Spinner et al., 26 May 2025, Favaro et al., 20 Aug 2025).

5. Symmetry Breaking and Subgroup Equivariance

Real-world high-energy experiments often feature only partial Lorentz invariance; event-level selection and detector design typically preserve only a subgroup $\mathcal{R} \subset \mathrm{SO}^+(1,3)$ . LLoCa enables explicit control over which symmetries are enforced, both architecturally (by fixing vectors in the frame prediction network) and at the input level (by providing reference vectors or explicit coordinates).

Empirical results show that for tasks like event generation, restricting equivariance to $\mathrm{SO}^+(1,1) \times \mathrm{SO}(2)$ suffices; for jet tagging, optimal performance is obtained only if symmetry is broken down to this subgroup. The LLoCa framework allows these breaks to be specified—or learned—by the network, making it suitable for rigorous studies of symmetry in practical collider analysis (Favaro et al., 20 Aug 2025).

6. Comparative Perspective and Applications

LLoCa-Transformer provides a universal mechanism for obtaining exact Lorentz equivariance in neural architectures for high-energy physics. Its broad compatibility ensures that any backbone—transformer, particle net, or graph network—can be "lifted" to Lorentz equivariance with only minor architectural modifications.

Key applications include:

Jet tagging with large simulated and experimental datasets, achieving improvements in classification accuracy, AUC, and speed.
Quantum field theory amplitude regression, outperforming all previously published equivariant GNNs by a substantial margin.
End-to-end event generation in collider data, allowing training objectives expressed in the correct symmetry frame and facilitating fair comparisons across symmetry-breaking choices.

The ability to propagate higher-order tensorial features and to recover or exceed specialized architectures’ accuracy at a fraction of computational cost positions LLoCa-Transformer as a foundational tool in modern machine learning for collider and astroparticle physics (Spinner et al., 26 May 2025, Favaro et al., 20 Aug 2025).

PDF Markdown Chat (Pro)

References (2)

Lorentz Local Canonicalization: How to Make Any Network Lorentz-Equivariant (2025)

Lorentz-Equivariance without Limitations (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to LLoCa-Transformer.