Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs (2206.11990v2)

Published 23 Jun 2022 in cs.LG, cs.AI, and physics.comp-ph

Abstract: Despite their widespread success in various domains, Transformer networks have yet to perform well across datasets in the domain of 3D atomistic graphs such as molecules even when 3D-related inductive biases like translational invariance and rotational equivariance are considered. In this paper, we demonstrate that Transformers can generalize well to 3D atomistic graphs and present Equiformer, a graph neural network leveraging the strength of Transformer architectures and incorporating SE(3)/E(3)-equivariant features based on irreducible representations (irreps). First, we propose a simple and effective architecture by only replacing original operations in Transformers with their equivariant counterparts and including tensor products. Using equivariant operations enables encoding equivariant information in channels of irreps features without complicating graph structures. With minimal modifications to Transformers, this architecture has already achieved strong empirical results. Second, we propose a novel attention mechanism called equivariant graph attention, which improves upon typical attention in Transformers through replacing dot product attention with multi-layer perceptron attention and including non-linear message passing. With these two innovations, Equiformer achieves competitive results to previous models on QM9, MD17 and OC20 datasets.

Citations (169)

Summary

  • The paper introduces a novel SE(3)-equivariant Transformer that leverages irreps and tensor products to model 3D atomistic graphs.
  • The method adapts standard Transformer components into their equivariant forms, enabling efficient integration of geometric and directional information.
  • Empirical results on QM9, MD17, and OC20 datasets show reduced MAEs and improved training efficiency over previous state-of-the-art methods.

The paper presents a highly technical paper on adapting Transformer architectures to 3D atomistic graphs by developing an SE(3)‐equivariant model that leverages irreducible representations (irreps) to preserve geometric symmetries. The proposed model, Equiformer, is designed to process atomistic systems—such as molecules and catalytic interfaces—where physical properties must be invariant or equivariant under 3D transformations (e.g., rotation, translation, and optionally inversion). Equiformer builds on the Transformer paradigm by replacing traditional operations with equivariant counterparts and by introducing tensor product operations to combine features expressed in different irreps.

Key contributions and technical aspects of the work include:

  • Equivariant Feature Representations. Equiformer employs irreps features, where each node’s representation is decomposed into a mixture of type‑LL vectors. In this formulation, a scalar corresponds to a type‑$0$ vector, while a Euclidean vector is modeled as a type‑$1$ entity, with higher degrees L2L \geq 2 representing more complex angular dependencies. The paper also describes how spherical harmonics Y(L)Y^{(L)} are used to project relative position vectors into these irreps spaces, ensuring that the geometric information is encoded in a manner that is naturally equivariant under SE(3)SE(3) transformations.
  • Equivariant Transformer Backbone.
    • Equivariant Linear Layers: Different channels corresponding to each type‑LL are processed independently with weight matrices that respect the equivariance condition f(DX(g)x)=DY(g)f(x)f(D_X(g)x)=D_Y(g)f(x).
    • Equivariant Layer Norm: For features with L>0L > 0, the standard deviation is computed via the L2 norm (interpreted as a root mean square over channels) which remains invariant under rotation. Biases are removed to maintain equivariance.
    • Equivariant Activation via Gating: Nonlinear activations are applied to the scalar (type‑$0$) channels while higher‑order components are modulated multiplicatively by these invariant nonlinearly transformed scalars, thereby preserving the equivariant structure of the overall representation.
    • Depth-wise Tensor Products (DTP): To combine features of different irreps, the method uses a depth‑wise tensor product that relies on Clebsch–Gordan coefficients. Crucially, the depth‑wise formulation restricts interactions to a one-to-one channel mapping, significantly reducing memory overhead while still capturing the necessary equivariant interactions.
  • Equivariant Graph Attention.
    • For two nodes ii and jj, an initial message xijx_{ij} is computed as the sum of linearly transformed node features. This message is then fused with directional information by taking a depth‑wise tensor product with the spherical harmonics encoding of the relative position rij\vec{r}_{ij}.
    • Attention weights aija_{ij} are computed on the scalar components (type‑$0$) extracted from the combined message via a multi‑layer perceptron (MLP), replacing the conventional dot product scheme used in standard Transformers. This MLP attention mechanism is claimed to approximate arbitrary attention patterns (by virtue of universal approximation properties) and demonstrates improved computational efficiency and expressivity—especially in large and diverse datasets such as OC20.
    • Non‑linear message passing is further refined by gate activations applied to the non‑scalar features and another tensor product with the spherical harmonics, allowing deeper interactions among high‑order features.
    • Multi‑head attention is implemented by computing several parallel equivariant attention outputs and concatenating them before a final projection.
  • Empirical Validation.
    • QM9: Equiformer achieves lower mean absolute errors (MAEs) across 12 regression tasks compared to prior methods such as NequIP, SEGNN, and TorchMD‑NET. Quantitative results show improved performance, particularly on properties that are sensitive to high‑order interactions.
    • MD17: The model is evaluated for both energy and force prediction on molecular dynamics trajectories of small organic molecules. By supporting vectors with higher LmaxL_{\text{max}} (values of 2 and 3 are explored), Equiformer reaches competitive force and energy MAEs, outperforming many existing equivariant models.
    • OC20: On the challenging initial structure to relaxed energy (IS2RE) task for catalytic systems with complex 3D geometries, the model not only improves on average MAE and energy within threshold (EwT) metrics but also exhibits a training efficiency improvement ranging from approximately 2.3× to 15.5× compared to state‑of‑the‑art methods under similar training settings.
  • Ablation Studies. Detailed ablation studies are presented comparing variants that differ in the attention mechanism (MLP vs. dot product) and in the type of message passing (linear vs. non‑linear). On datasets with smaller graphs (e.g., QM9), dot product attention with linear message passing already yields competitive performance. In contrast, on OC20 with larger graphs, the increased expressiveness of MLP attention combined with non‑linear message passing is advantageous. These studies underscore that non‑linear interactions and more elaborate attention computations can be selectively activated depending on the complexity of the underlying graph structure.
  • Theoretical and Practical Implications. The proposed design shows that a minimal yet principled modification of the Transformer architecture—by enshrining SE(3)SE(3) equivariance via irreps and tensor products—is sufficient for achieving state‑of‑the‑art performance on 3D atomistic tasks. The work also highlights the benefit of supporting higher‑order vector representations (beyond the typical L=0,1L=0,1 used in many equivariant networks) and provides an efficient implementation strategy via depth‑wise tensor products, enabling scaling to larger systems without prohibitive computational costs.

In summary, the work provides a comprehensive framework to engineer Transformer-style architectures that are fully equivariant to 3D Euclidean symmetries, thereby bridging the gap between classical invariant GNNs and modern Transformer approaches in quantum chemistry and materials science. Its combination of equivariant operations, a novel graph attention mechanism, and efficient tensor-product implementations jointly yield strong empirical performance and training efficiency across diverse molecular and catalytic datasets.

Youtube Logo Streamline Icon: https://streamlinehq.com