Equiformer: Equivariant Graph Transformer
- Equiformer is a family of equivariant graph Transformer architectures that leverages group-theoretic features and irreducible representations of SO(3) to model 3D structures.
- It replaces standard Transformer layers with equivariant operations that enforce rotational symmetry, ensuring accurate predictions in atomistic and molecular systems.
- With advances in EquiformerV2, the model achieves efficient scaling and enhanced performance on quantum chemistry and materials benchmarks.
Equiformer is a family of graph Transformer architectures specifically designed to enforce equivariance under three-dimensional symmetry groups (notably SE(3) and SO(3)) in modeling atomistic and molecular systems. Equiformer architectures leverage group-theoretic principled features and attention, with operations based on irreducible representations ("irreps") of SO(3), and replace standard Transformer mechanisms with equivariant alternatives. This approach enables learning on 3D molecular graphs, accurately predicting properties that depend critically on geometric structure.
1. Group-Theoretic Foundations and Equivariant Graph Representations
Equiformer builds on the concept of equivariant neural networks, where node features encode information as tuples of irreducible representations under SO(3). Each degree- block, , transforms by the Wigner-D matrix under global rotation :
This guarantees that predicted outputs respect rotational symmetries, a fundamental requirement for molecular and materials science applications.
Nodes in the input molecular graph correspond to atoms, with associated geometric and atomic-type features. Edges represent geometric neighbors within a cutoff. 3D geometry is incorporated via spherical harmonic embeddings of inter-atomic vectors and various distance-dependent radial functions (Liao et al., 2022).
2. Equiformer: Core Architecture and Equivariant Operations
The original Equiformer replaces standard Transformer layers with their SO(3)/SE(3)-equivariant counterparts:
- Equivariant graph attention: Message passing utilizes depth-wise tensor products and equivariant linear layers. Standard dot-product attention is replaced by a multi-layer perceptron (MLP)-based attention, and attention weights are extracted from degree-0 (scalar) projections.
- Irrep feature processing: Features at each node are direct sums of multiple degree- representations:
- Equivariant layer normalization: Normalization is performed within each degree channel, preserving equivariance.
- Gate nonlinearities and tensor interaction: Nonlinearities include equivariant gates, operating on components with learned scalar gates derived from features, and Clebsch–Gordan tensor products for mixing across degrees.
- Residual connections and feed-forward networks retain full group-theoretic symmetry (Liao et al., 2022).
This architecture is expressive enough to match or surpass the performance of other -equivariant GNNs on tasks including quantum-chemical property prediction (QM9), molecular dynamics (MD17), and large-surface catalysis datasets (OC20).
3. EquiformerV2: Scaling to Higher-Degree Equivariant Representations
Although the original Equiformer outperformed prior models, it was computationally constrained to low maximum degree 0, due to the 1–2 scaling of SO(3) tensor products.
EquiformerV2 introduces several advances (Liao et al., 2023):
- Efficient eSCN convolution: The core SO(3) convolution is replaced by the eSCN method, which leverages a canonical frame such that the spherical harmonics become nonzero only for 3. This reduces the convolution to an SO(2)-invariant linear mixing; mathematically, each message can be expressed as:
4
yielding only 5 complexity and enabling 6 up to 6–8 in practice.
- Attention re-normalization: A LayerNorm is applied to scalar attention scores before the LeakyReLU, stabilizing attention distributions as channel counts grow.
- Separable S7 activation: Nonlinear activations on spherical signals are applied separately to 8 and 9 degrees, preventing gradient blow-up when higher 0 channels are used.
- Separable layer normalization (SLN): Degree-0 and 1 channels are normalized with different statistics—mean/std for scalars, RMS for vectors/tensors—preserving relative scaling between degrees.
A block-level breakdown is as follows:
- Edge embedding: RBF of inter-atomic distance combined with eSCN convolution.
- Attention: LayerNorm and MLP-processed scores, equivariant value computation with separable S2 activation and SLN.
- Residual update: Equivariant linear followed by SLN.
- Feed-forward: Linear 3 separable S4 activation 5 linear 6 SLN.
In aggregate, these enable high-degree equivariant models without prohibitive computational cost.
4. Computational Complexity and Scaling Behaviors
The reduction in computational cost is essential for scaling:
- Original Equiformer: Time per layer 7 (with 8 heads, 9 channels), limiting practical 0.
- EquiformerV2: Time per layer 1 via SO(2) reduction, with unchanged memory cost.
- Impact: Allows training of deep models with 2 on large datasets, preserving model expressivity and rotational equivariance (Liao et al., 2023).
5. Empirical Results and Benchmarks
EquiformerV2 sets new empirical standards on large quantum-chemistry and materials datasets:
- OC20 S2EF benchmark: Up to 3 reduction in force MAE, 4 reduction in energy MAE over GemNet-OC, and 1–2 meV/Å improvement at equal throughput versus previous architectures. A 25 reduction in density functional theory (DFT) calls is attained when computing adsorption energies (Liao et al., 2023).
- AdsorbML relaxation: 6 DFT points attains 90.5% success, compared to 86% for SCN-Large, for a 27 reduction in computational cost at equivalent accuracy.
- OC22 S2EF and IS2RE: Models trained only on OC22 outperform GemNet-OC trained on both OC20 and OC22 in both energy and force MAE (energy MAE reduced from 29.4 to 24.1 meV; force MAE reduced by 9%).
- QM9: EquiformerV2 surpasses Equiformer on 9/12 targets, and with Noisy-Node regularization, improvement is seen on 10/12.
6. Broader Applicability and Extensions
The Equiformer architecture generalizes to 3D graph learning in domains beyond quantum chemistry. For example, PPIformer (Bushuiev et al., 2023) instantiates an Equiformer-style SE(3)-equivariant Transformer for protein–protein interaction design. This adaptation supports input graphs of 8 backbone coordinates with scalar and vector features, enforces SE(3) equivariance layer-wise, and outperforms previous models on mutation effect prediction and biophysical generalization tasks.
Empirical evidence indicates that Equiformer-style architectures are effective across diverse 3D geometric graph learning settings, with group-theoretic symmetry as a core architectural principle.
7. Outlook and Implications
Efficient equivariant Transformers such as EquiformerV2 open new avenues for large-scale molecular and materials modeling:
- Higher-degree representations facilitate capturing angular details critical for accurate force and energy predictions.
- The eSCN convolution mechanism provides a model class that is both expressively rich and computationally feasible for large-scale datasets.
- The architecture provides a foundation for future work, including pretraining on massive unlabeled molecular datasets, extension to full 9 equivariance (including inversion), and integration into end-to-end simulation and design pipelines for materials and biomolecules (Liao et al., 2023).
Overall, Equiformer and its descendants represent a convergence of group-theoretic GNN principles with scalable Transformer design, reaching state-of-the-art predictive performance in multiple domains involving 3D geometric data (Liao et al., 2022, Liao et al., 2023, Bushuiev et al., 2023).