Papers
Topics
Authors
Recent
2000 character limit reached

LLoCa-Transformer: Exact Lorentz Equivariance

Updated 3 December 2025
  • LLoCa-Transformer is a neural architecture that provides exact Lorentz equivariance by canonicalizing features into learned local reference frames.
  • It integrates standard neural layers with equivariant frame prediction and tensorial message passing to efficiently handle space-time tensor data.
  • Empirical results demonstrate improved performance and computational efficiency on jet tagging and QFT amplitude regression benchmarks.

The LLoCa-Transformer is a generic neural architecture designed to endow any backbone—such as transformers and graph neural networks—with exact Lorentz equivariance. Built on the Lorentz Local Canonicalization (LLoCa) framework, it operates by learning Lorentz-equivariant local reference frames for each entity (e.g., particle) within the input, canonicalizing features into these local frames, and then leveraging standard neural layers. This approach enables seamless propagation of space-time tensorial information while eliminating the architectural constraints of previous Lorentz-equivariant models, and achieves state-of-the-art accuracy and computational efficiency on challenging high-energy physics benchmarks (Spinner et al., 26 May 2025, Favaro et al., 20 Aug 2025).

1. Theoretical Foundations of LLoCa-Transformer

Lorentz symmetry underpins fundamental interactions in high-energy physics, where observed data such as four-momenta p=(E,p)p = (E, \vec{p}) transform under the proper orthochronous Lorentz group SO+(1,3)\mathrm{SO}^+(1,3). Traditional Lorentz-equivariant neural networks deploy bespoke convolution or message-passing layers, severely limiting flexibility. LLoCa overcomes this limitation by decoupling equivariance from architectural constraints.

The central construct is the prediction, for each input object ii, of an equivariant local reference frame LiSO+(1,3)L_i \in \mathrm{SO}^+(1,3) such that under a global Lorentz transformation Λ\Lambda,

LiLi=LiΛ1L_i \to L'_i = L_i \Lambda^{-1}

and any subsequent canonicalization into the local frame transforms as

xi,L=Lixi    xi,L=LiΛxi=xi,Lx_{i,L} = L_i x_i \implies x_{i,L}' = L_i' \Lambda x_i = x_{i,L}

rendering xi,Lx_{i,L} exactly invariant under Λ\Lambda.

By expressing all physics features in these locally canonicalized variables before feeding them through arbitrary neural backbone layers, the output can finally be de-canonicalized to restore the correct equivariant transformation law. This construction guarantees exact equivariance at negligible overhead and removes the requirement for special Lorentz layers (Spinner et al., 26 May 2025).

2. Architecture and Algorithmic Structure

The LLoCa-transformer comprises four operational components:

  1. Equivariant Frame Prediction: For NN particles with input features (typically four-momenta and optional scalars), a Frames-Net predicts three four-vectors per object:

vi,k=j=1Nsoftmaxj[φk(si,sj,pi,pj)](pi+pj),k{0,1,2}v_{i,k} = \sum_{j=1}^N \mathrm{softmax}_j\left[\varphi_k(s_i, s_j, \langle p_i, p_j \rangle)\right](p_i + p_j), \quad k \in \{0,1,2\}

Here, φk\varphi_k is a small MLP on Lorentz scalars, and p,q\langle p, q \rangle denotes the Minkowski product.

From {vi,0,vi,1,vi,2}\{v_{i,0}, v_{i,1}, v_{i,2}\}, LiL_i is constructed via a polar decomposition: BiB_i (a boost from vi,0v_{i,0}), Gram–Schmidt for RiR_i using Bivi,1B_i v_{i,1} and Bivi,2B_i v_{i,2}, yielding Li=RiBiL_i = R_i B_i.

  1. Canonicalization: For any feature xx, canonicalization is performed as:

xLi=Lixx_{L_i} = L_i x

For tensorial objects, the appropriate group representation ρ\rho is used:

fLiμ1μn=(ρ(Li)f)μ1μnf_{L_i}^{\mu_1 \cdots \mu_n} = (\rho(L_i) f)^{\mu_1 \cdots \mu_n}

This ensures all features processed by the transformer are Lorentz-invariant.

  1. Standard Transformer Stack on Canonicalized Features: The canonicalized inputs fLif_{L_i} enter an unmodified transformer or neural backbone, with linear key/query/value projections, multi-head self-attention, feed-forward layers, and residual norms. Equivariance is maintained throughout, as all operations act on Lorentz-invariant quantities.
  2. Tensorial Message Passing: Attention and message aggregation across local frames are conducted via the inter-frame transformation:

fLi=j=1Nexp[1dqLi,ρ(LiLj1)kLj]ρ(LiLj1)vLjf_{L_i}' = \sum_{j=1}^N \exp \left[\frac{1}{\sqrt{d}} \langle q_{L_i}, \rho(L_i L_j^{-1}) k_{L_j} \rangle \right] \rho(L_i L_j^{-1}) v_{L_j}

Here, all contractions use the Minkowski product. The general tensorial message-passing update is

fLinew=ψ(fLi,j=1Nϕ(ρ(LiLj1)mj,Lj))f_{L_i}^{\text{new}} = \psi\left(f_{L_i}, \bigoplus_{j=1}^N \phi(\rho(L_i L_j^{-1}) m_{j, L_j})\right)

for arbitrary maps ϕ,ψ\phi, \psi and permutation-invariant sum \bigoplus.

3. Data Augmentation and the Local/Global Frame Dichotomy

By fixing LiΛaugL_i \equiv \Lambda_{\rm aug} (a random global Lorentz transformation) for all ii within an event, canonicalization reduces to traditional Lorentz data augmentation—preprocessing each event by a global transform before processing by an ordinary non-equivariant network: xLi=Λaugxix_{L_i} = \Lambda_{\rm aug} x_i Thus, standard augmentation becomes a particular instance of LLoCa, while learning distinct LiL_i for each object yields exact equivariance. This perspective unifies data augmentation and equivariant preprocessing (Spinner et al., 26 May 2025).

4. Empirical Performance and Ablations

LLoCa-transformers provide substantial performance gains on multiple LHC-relevant tasks:

Task Baseline LLoCa-Transformer Specialized Lorentz-GNN (L-GATr)
Jet tagging (JetClass) 85.5% (AUC 0.9867) 86.4% (AUC 0.9882) Similar AUC, but 4× slower
QFT amp. regression MSE 11.9×10611.9 \times 10^{-6} (vanilla) MSE 1.5±0.1×1061.5 \pm 0.1 \times 10^{-6} MSE 2.5±0.2×1062.5 \pm 0.2 \times 10^{-6}

Notable ablations and findings include:

  • Tensorial message passing (combining scalar and vector features) is critical; all-scalar attention results in >30× higher regression MSE.
  • Using the Minkowski metric within attention rather than Euclidean halves MSE.
  • Frame-Net capacity requirements are low; small MLPs with dropout suffice.
  • For large datasets, exact equivariance outperforms even optimized data augmentation; for very small samples, data augmentation may marginally outperform due to inductive bias.

Computationally, LLoCa-transformers achieve state-of-the-art with only 10%10\%30%30\% extra FLOPs and $30$–110%110\% extra training time, yet remain up to 4×4\times faster and $5$–100×100\times more efficient in forward FLOPs compared to previous state-of-the-art Lorentz-equivariant architectures (Spinner et al., 26 May 2025, Favaro et al., 20 Aug 2025).

5. Symmetry Breaking and Subgroup Equivariance

Real-world high-energy experiments often feature only partial Lorentz invariance; event-level selection and detector design typically preserve only a subgroup RSO+(1,3)\mathcal{R} \subset \mathrm{SO}^+(1,3). LLoCa enables explicit control over which symmetries are enforced, both architecturally (by fixing vectors in the frame prediction network) and at the input level (by providing reference vectors or explicit coordinates).

Empirical results show that for tasks like event generation, restricting equivariance to SO+(1,1)×SO(2)\mathrm{SO}^+(1,1) \times \mathrm{SO}(2) suffices; for jet tagging, optimal performance is obtained only if symmetry is broken down to this subgroup. The LLoCa framework allows these breaks to be specified—or learned—by the network, making it suitable for rigorous studies of symmetry in practical collider analysis (Favaro et al., 20 Aug 2025).

6. Comparative Perspective and Applications

LLoCa-Transformer provides a universal mechanism for obtaining exact Lorentz equivariance in neural architectures for high-energy physics. Its broad compatibility ensures that any backbone—transformer, particle net, or graph network—can be "lifted" to Lorentz equivariance with only minor architectural modifications.

Key applications include:

  • Jet tagging with large simulated and experimental datasets, achieving improvements in classification accuracy, AUC, and speed.
  • Quantum field theory amplitude regression, outperforming all previously published equivariant GNNs by a substantial margin.
  • End-to-end event generation in collider data, allowing training objectives expressed in the correct symmetry frame and facilitating fair comparisons across symmetry-breaking choices.

The ability to propagate higher-order tensorial features and to recover or exceed specialized architectures’ accuracy at a fraction of computational cost positions LLoCa-Transformer as a foundational tool in modern machine learning for collider and astroparticle physics (Spinner et al., 26 May 2025, Favaro et al., 20 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to LLoCa-Transformer.