Papers
Topics
Authors
Recent
Search
2000 character limit reached

E2Former-LSR: Equivariant Graph Transformer

Updated 14 January 2026
  • The paper introduces E2Former-LSR, a graph transformer that employs linear-scaling Wigner 6j convolutions to achieve constant error scaling in large molecular systems.
  • It integrates long-range attention with local-shift regularization to capture non-local interactions and ensure robustness in molecular dynamics modeling.
  • Empirical benchmarks on MolLR25 demonstrate significant improvements, including up to 30x speedup and reduced force errors compared to fixed-cutoff models.

E2Former-LSR is an SO(3)-equivariant graph transformer architecture designed to enable scalable, high-fidelity molecular machine learning force fields (MLFFs) with both local geometric precision and explicit sensitivity to long-range electronic and conformational effects. The model is an extension of the E2Former framework, employing linear-scaling Wigner 6j convolutions as its core local aggregation mechanism, and incorporating long-range attention and local-shift regularization to achieve constant error scaling for molecular systems with up to 1,200 atoms and beyond. E2Former-LSR has demonstrated that principled, non-local neural architectures are necessary for accurate molecular modeling of macromolecules and large supramolecular assemblies (Li et al., 31 Jan 2025, Wang et al., 7 Jan 2026).

1. Architectural Foundations

E2Former-LSR is built on a graph-based representation of molecular systems, where atomic positions and chemical identities are mapped to nodes, and neighbor relationships (typically within a radius cutoff) define edges. Input atomic nodes iVi \in \mathcal{V} have 3D positions ri\mathbf{r}_i and chemical types ZiZ_i, with neighbor connectivity within a short-range cutoff rshort5 A˚r_\mathrm{short} \approx 5\ \text{Å}. Feature vectors are SO(3)-irrep decomposed, such that for every irrep order =0Lmax\ell = 0 \dots L_\mathrm{max}, each node carries features hi,R(2+1)×d\mathbf{h}_{i,\ell} \in \mathbb{R}^{(2\ell+1)\times d_\ell}. Edge-level geometry is encoded via spherical harmonics Ym(r^ij)Y^\ell_m(\widehat{\mathbf{r}}_{ij}), where r^ij=(rjri)/rjri\widehat{\mathbf{r}}_{ij} = (\mathbf{r}_j - \mathbf{r}_i) / \|\mathbf{r}_j - \mathbf{r}_i\|.

The central operation is the E2Attention block, which first projects features to scalar queries/keys for attention, computes value tensors using Clebsch–Gordan (CG) products with spherical harmonics, collects neighbor information with softmax attention, and recouples using the Wigner $6j$ tensor product. Residual equivariant MLPs update the node features.

2. Wigner 6j Convolution and Linear-Time Scaling

Traditional SO(3)-equivariant graph neural networks use edge-based CG tensor products, incurring an O(EL3)O(|\mathcal{E}| \cdot L^3) computational scaling per layer due to the high cost of coupling all irrep orders per edge, which becomes prohibitive for large or dense graphs.

E2Former introduces a binomial local expansion that allows spherical harmonics over edges to be rewritten in terms of node-local spherical harmonics:

Y(rij)=[Y1(rj)Y1(ri)]proj=u=0(1)u(u)[Yu(ri)Yu(rj)]projY^{\ell}(\mathbf{r}_{ij}) = \left[ Y^1(\mathbf{r}_j) - Y^1(\mathbf{r}_i) \right]^{\otimes\ell}_{\mathrm{proj}} = \sum_{u=0}^{\ell} (-1)^{\ell-u} \binom{\ell}{u} \left[ Y^u(\mathbf{r}_i) \otimes Y^{\ell-u}(\mathbf{r}_j) \right]_{\mathrm{proj}}

The Wigner 6j convolution exploits this and shifts the cost to per-node recoupling steps:

jN(i)aij[hjY(rij)]proj=u=0(1)u(u){Yu(ri)6j(jN(i)aij[hjYu(rj)]proj)}\sum_{j \in \mathcal{N}(i)} a_{ij} \left[ \mathbf{h}_j \otimes Y^\ell(\mathbf{r}_{ij}) \right]_{\mathrm{proj}} = \sum_{u=0}^\ell (-1)^{\ell-u} \binom{\ell}{u} \left\{ Y^u(\mathbf{r}_i) \otimes_{6j} \left( \sum_{j \in \mathcal{N}(i)} a_{ij} \left[ \mathbf{h}_j \otimes Y^{\ell-u}(\mathbf{r}_j) \right]_{\mathrm{proj}} \right) \right\}

Precomputing Yu(ri)Y^u(\mathbf{r}_i) per node and sharing neighborwise partial products enable the overall cost per layer to scale as O(VL3+EL2)O(|\mathcal{V}| L^3 + |\mathcal{E}| L^2), resulting, in practice, in empirically linear overall scaling. E2Former achieves 7x–30x speedup over standard SO(3) convolutions as system size increases (Li et al., 31 Jan 2025).

3. Long-Range Aware Message Passing and Local-Shift Regularization

E2Former-LSR enhances E2Former by incorporating two auxiliary mechanisms: long-range attention and local-shift regularization.

Long-Range Attention

The model augments the standard short-range message passing with an additional attention stream operating over a long-range bipartite graph. Chemically defined fragments uu are constructed (e.g., by BRICS decomposition), with each fragment’s center given by

Pu=iS(u)γipi,iγi=1\mathbf{P}_u = \sum_{i\in S(u)} \gamma_i \mathbf{p}_i, \quad \sum_i \gamma_i = 1

where S(u)S(u) is the set of atoms in fragment uu. Atom–fragment edges are defined for all atom–fragment pairs within rlong15r_\mathrm{long} \approx 15 Å. These augmentations enable receptive fields that encompass non-local, non-covalent interactions critical for macromolecular and supramolecular behavior.

Message passing on the long-range graph uses the same equivariant attention structure but restricts to =0,1\ell=0,1 features to control complexity. Both short- and long-range modules alternate or are stacked (e.g., 4 short + 2 long layers), and outputs are fused via a late-stage MLP.

Local-Shift Regularization

E2Former-LSR introduces a local-shift regularization by perturbing each atom’s position with small isotropic Gaussian noise, enforcing via a regularization loss

LLS=ihi(ri+δi)hi(ri)2\mathcal{L}_{LS} = \sum_i \| \mathbf{h}_i(\mathbf{r}_i+\delta_i) - \mathbf{h}_i(\mathbf{r}_i) \|^2

This encourages output invariance to small local shifts, serving as an SE(3) data augmentation and yielding more robust, generalizable models (Li et al., 31 Jan 2025).

4. Training Targets, Loss Functions, and Equivariance

Total energy prediction is achieved by summing atomic contributions, preserving translation invariance. The predicted energy E^\widehat E and forces F^i\widehat{\mathbf F}_i are given by

E^=ig(zi,0),F^i=E^pi\widehat E = \sum_i g \left( \mathbf{z}_{i,0} \right), \qquad \widehat{\mathbf F}_i = -\frac{\partial \widehat E}{\partial \mathbf{p}_i}

with fusion of short- and long-range features as zi=Fuse(hi(Lshort),xi(Llong))\mathbf{z}_i=\mathrm{Fuse}(\mathbf{h}^{(L_{\mathrm{short}})}_i,\mathbf{x}^{(L_{\mathrm{long}})}_i). The loss function combines L1L_1 energy and force objectives, with weighting λE=1\lambda_E=1, λF=100\lambda_F=100:

L=λEE^Eref1+λF1ni=1nF^iFi,ref1\mathcal{L} = \lambda_E\,\|\widehat E - E_{\rm ref}\|_1 +\lambda_F\,\frac1n\sum_{i=1}^n\|\widehat{\mathbf F}_i-\mathbf F_{i,\rm ref}\|_1

Exact rotation equivariance is enforced by construction, with all message functions dependent only on relative positions and spherical harmonics. Translation invariance is maintained by summing per-atom outputs.

5. Empirical Performance and Benchmarks

E2Former-LSR has been validated on the MolLR25 benchmark (Wang et al., 7 Jan 2026), which includes:

  • Di-molecule dissociation: 4,950 diverse dimers, separation 0.1–10.1 Å.
  • Medium-scale protein conformations: Systems with 700–1,200 atoms, sampled from D. E. Shaw trajectories.
  • MD trajectory suite: ab initio MD for molecular clusters and frameworks (up to >500 atoms).

Key performance metrics:

Task E2Former-LSR (MAE, meV/Å) MACE-Large (MAE, meV/Å) Inference Speed
Di-molecule dissoc. (8–10 Å) Energy: ~0.21 ~0.34 -
Force: ~0.60 ~1.17 -
Medium-scale proteins Force: 67% lower - 30% faster
MD/ZIF-8 Force: 4.73 6.68 -

Additional findings include:

  • Error scaling for fixed-cutoff architectures (MACE) rises with system size; E2Former-LSR maintains flat errors of ~5–7 meV/Å up to N1,200N \approx 1,200 atoms.
  • E2Former-LSR better models 1/R⁶-like non-covalent force decay, with fixed-cutoff baselines showing cutoff artifacts beyond ~3 Å.
  • Removal of long-range blocks increases long-distance MAE by 2–3× and force cosine similarity (CS_f) degrades by 5–10% (Wang et al., 7 Jan 2026).

6. Comparative Analysis and Theoretical Implications

Ablation studies demonstrate that without explicit long-range attention, MLFFs suffer from monotonic error growth with system size and cannot accurately represent smooth asymptotic force decay. Alternative long-range–aware models (e.g., DPA-2) do not match the fidelity obtained through explicit fragment-level equivariant attention. E2Former-LSR thereby establishes the necessity of non-local, equivariant neural architectures—mere scaling of data or parameters in fixed-cutoff frameworks is insufficient for generalizable MLFF performance (Wang et al., 7 Jan 2026).

A plausible implication is that, as system complexity grows, shift-invariant, non-local receptive fields and SO(3) equivariant operations become essential architectural primitives for future molecular modeling approaches.

7. Significance and Broader Impact

E2Former-LSR merges computational efficiency (via Wigner 6j recoupling and linear scaling) with the generalization and fidelity gains required for next-generation MLFFs. It enables accurate, high-throughput molecular dynamics on previously intractable, large-scale chemical and biological systems, supporting both covalent and non-covalent regimes, and providing a computational bridge between quantum accuracy and macromolecular scale. This architectural paradigm confirms that non-local graph message passing, fragment-level attention, and local-shift regularization are essential for modeling complex molecular phenomena relevant to chemistry, materials science, and biophysics (Li et al., 31 Jan 2025, Wang et al., 7 Jan 2026).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to E2Former-LSR.