E2Former-LSR: Equivariant Graph Transformer
- The paper introduces E2Former-LSR, a graph transformer that employs linear-scaling Wigner 6j convolutions to achieve constant error scaling in large molecular systems.
- It integrates long-range attention with local-shift regularization to capture non-local interactions and ensure robustness in molecular dynamics modeling.
- Empirical benchmarks on MolLR25 demonstrate significant improvements, including up to 30x speedup and reduced force errors compared to fixed-cutoff models.
E2Former-LSR is an SO(3)-equivariant graph transformer architecture designed to enable scalable, high-fidelity molecular machine learning force fields (MLFFs) with both local geometric precision and explicit sensitivity to long-range electronic and conformational effects. The model is an extension of the E2Former framework, employing linear-scaling Wigner 6j convolutions as its core local aggregation mechanism, and incorporating long-range attention and local-shift regularization to achieve constant error scaling for molecular systems with up to 1,200 atoms and beyond. E2Former-LSR has demonstrated that principled, non-local neural architectures are necessary for accurate molecular modeling of macromolecules and large supramolecular assemblies (Li et al., 31 Jan 2025, Wang et al., 7 Jan 2026).
1. Architectural Foundations
E2Former-LSR is built on a graph-based representation of molecular systems, where atomic positions and chemical identities are mapped to nodes, and neighbor relationships (typically within a radius cutoff) define edges. Input atomic nodes have 3D positions and chemical types , with neighbor connectivity within a short-range cutoff . Feature vectors are SO(3)-irrep decomposed, such that for every irrep order , each node carries features . Edge-level geometry is encoded via spherical harmonics , where .
The central operation is the E2Attention block, which first projects features to scalar queries/keys for attention, computes value tensors using Clebsch–Gordan (CG) products with spherical harmonics, collects neighbor information with softmax attention, and recouples using the Wigner $6j$ tensor product. Residual equivariant MLPs update the node features.
2. Wigner 6j Convolution and Linear-Time Scaling
Traditional SO(3)-equivariant graph neural networks use edge-based CG tensor products, incurring an computational scaling per layer due to the high cost of coupling all irrep orders per edge, which becomes prohibitive for large or dense graphs.
E2Former introduces a binomial local expansion that allows spherical harmonics over edges to be rewritten in terms of node-local spherical harmonics:
The Wigner 6j convolution exploits this and shifts the cost to per-node recoupling steps:
Precomputing per node and sharing neighborwise partial products enable the overall cost per layer to scale as , resulting, in practice, in empirically linear overall scaling. E2Former achieves 7x–30x speedup over standard SO(3) convolutions as system size increases (Li et al., 31 Jan 2025).
3. Long-Range Aware Message Passing and Local-Shift Regularization
E2Former-LSR enhances E2Former by incorporating two auxiliary mechanisms: long-range attention and local-shift regularization.
Long-Range Attention
The model augments the standard short-range message passing with an additional attention stream operating over a long-range bipartite graph. Chemically defined fragments are constructed (e.g., by BRICS decomposition), with each fragment’s center given by
where is the set of atoms in fragment . Atom–fragment edges are defined for all atom–fragment pairs within Å. These augmentations enable receptive fields that encompass non-local, non-covalent interactions critical for macromolecular and supramolecular behavior.
Message passing on the long-range graph uses the same equivariant attention structure but restricts to features to control complexity. Both short- and long-range modules alternate or are stacked (e.g., 4 short + 2 long layers), and outputs are fused via a late-stage MLP.
Local-Shift Regularization
E2Former-LSR introduces a local-shift regularization by perturbing each atom’s position with small isotropic Gaussian noise, enforcing via a regularization loss
This encourages output invariance to small local shifts, serving as an SE(3) data augmentation and yielding more robust, generalizable models (Li et al., 31 Jan 2025).
4. Training Targets, Loss Functions, and Equivariance
Total energy prediction is achieved by summing atomic contributions, preserving translation invariance. The predicted energy and forces are given by
with fusion of short- and long-range features as . The loss function combines energy and force objectives, with weighting , :
Exact rotation equivariance is enforced by construction, with all message functions dependent only on relative positions and spherical harmonics. Translation invariance is maintained by summing per-atom outputs.
5. Empirical Performance and Benchmarks
E2Former-LSR has been validated on the MolLR25 benchmark (Wang et al., 7 Jan 2026), which includes:
- Di-molecule dissociation: 4,950 diverse dimers, separation 0.1–10.1 Å.
- Medium-scale protein conformations: Systems with 700–1,200 atoms, sampled from D. E. Shaw trajectories.
- MD trajectory suite: ab initio MD for molecular clusters and frameworks (up to >500 atoms).
Key performance metrics:
| Task | E2Former-LSR (MAE, meV/Å) | MACE-Large (MAE, meV/Å) | Inference Speed |
|---|---|---|---|
| Di-molecule dissoc. (8–10 Å) | Energy: ~0.21 | ~0.34 | - |
| Force: ~0.60 | ~1.17 | - | |
| Medium-scale proteins | Force: 67% lower | - | 30% faster |
| MD/ZIF-8 | Force: 4.73 | 6.68 | - |
Additional findings include:
- Error scaling for fixed-cutoff architectures (MACE) rises with system size; E2Former-LSR maintains flat errors of ~5–7 meV/Å up to atoms.
- E2Former-LSR better models 1/R⁶-like non-covalent force decay, with fixed-cutoff baselines showing cutoff artifacts beyond ~3 Å.
- Removal of long-range blocks increases long-distance MAE by 2–3× and force cosine similarity (CS_f) degrades by 5–10% (Wang et al., 7 Jan 2026).
6. Comparative Analysis and Theoretical Implications
Ablation studies demonstrate that without explicit long-range attention, MLFFs suffer from monotonic error growth with system size and cannot accurately represent smooth asymptotic force decay. Alternative long-range–aware models (e.g., DPA-2) do not match the fidelity obtained through explicit fragment-level equivariant attention. E2Former-LSR thereby establishes the necessity of non-local, equivariant neural architectures—mere scaling of data or parameters in fixed-cutoff frameworks is insufficient for generalizable MLFF performance (Wang et al., 7 Jan 2026).
A plausible implication is that, as system complexity grows, shift-invariant, non-local receptive fields and SO(3) equivariant operations become essential architectural primitives for future molecular modeling approaches.
7. Significance and Broader Impact
E2Former-LSR merges computational efficiency (via Wigner 6j recoupling and linear scaling) with the generalization and fidelity gains required for next-generation MLFFs. It enables accurate, high-throughput molecular dynamics on previously intractable, large-scale chemical and biological systems, supporting both covalent and non-covalent regimes, and providing a computational bridge between quantum accuracy and macromolecular scale. This architectural paradigm confirms that non-local graph message passing, fragment-level attention, and local-shift regularization are essential for modeling complex molecular phenomena relevant to chemistry, materials science, and biophysics (Li et al., 31 Jan 2025, Wang et al., 7 Jan 2026).