Papers
Topics
Authors
Recent
Search
2000 character limit reached

GemNet-OC: Efficient GNN for Molecular Simulations

Updated 16 April 2026
  • GemNet-OC is a graph neural network that leverages directional edge embeddings and spherical representations to maintain translation invariance and equivariance in molecular simulations.
  • It employs a hierarchical message passing architecture with simplified Gaussian RBFs and Legendre-based angular decompositions to achieve up to 10× faster convergence while preserving accuracy.
  • The model integrates joint energy and force regression with efficient k-NN graph construction, ensuring scalable performance and robust generalization across chemically diverse datasets.

GemNet-OC is a graph neural network (GNN) model specifically designed for large-scale, chemically diverse molecular simulation tasks, with a primary focus on direct prediction of energies and atomic forces from atomic configurations. It is a direct derivative of the GemNet architecture, with variant-specific modifications to ensure computational and statistical efficiency on the Open Catalyst 2020 (OC20) dataset. GemNet-OC addresses core problems in molecular property prediction: translation invariance, permutation and SO(3) equivariance, scalability, and generalization across datasets with high diversity and nontrivial domain shift (Gasteiger et al., 2021, Gasteiger et al., 2022).

1. Theoretical Foundations and Representation Properties

GemNet-OC is grounded in the framework of GNNs with spherical representations, ensuring universal approximation of functions relevant for molecular systems that are invariant to global translation, equivariant to atom-index permutations, and equivariant to rotations. Central to this is the use of direction-aware ("directional") edge embeddings defined by interatomic vectors, enabling the network to capture physical symmetries. The model's update function is:

f:R3×n×Rh×n→Wnf: \mathbb{R}^{3 \times n} \times \mathbb{R}^{h \times n} \to W^n

where the mapping is invariant/equivariant as required. The continuous universality result is formally captured by stacking "spherical representations" on each atom:

ha(r,x)(r^)=θa(r^)+∑b∈N(a)Fsphere(xba,r^)hb(r,x)(r^)h_a(r, x)(\hat{\mathbf{r}}) = \theta_a(\hat{\mathbf{r}}) + \sum_{b\in N(a)} F_{\mathrm{sphere}}(x_{ba}, \hat{\mathbf{r}}) h_b(r, x)(\hat{\mathbf{r}})

with xba=∥rb−ra∥x_{ba} = \|r_b - r_a\| and r^=(rb−ra)/xba\hat{\mathbf{r}} = (r_b - r_a)/x_{ba}. FsphereF_{\mathrm{sphere}} uses real spherical harmonics to encode rotational symmetries. Discretization is performed by sampling over bond directions, yielding directed edge embeddings ec→ae_{c\to a}, which aggregate local geometric information (Gasteiger et al., 2021).

2. Model Architecture and Message Passing Hierarchy

GemNet-OC's architecture builds upon hierarchical message passing utilizing both atom and directed edge representations:

  • Graph Construction: Each atom connects to its knbr=30k_{\mathrm{nbr}}=30 nearest neighbours (k-NN graph), ensuring fixed compute per atom and improved stability.
  • Embeddings: Initial atom (hi(0))\left(h_i^{(0)}\right) and directed edge (mij(0))\left(m_{ij}^{(0)}\right) embeddings have dimensionality 512 (640/768 in the large variant).
  • Basis Functions: Radial basis functions (RBF), circular basis functions (CBF), and spherical basis functions (SBF) are decoupled. Radial terms use Gaussians; angular terms are represented via Legendre polynomials. This decoupling increases throughput by ≈29% relative to previous full spherical/harmonic designs (Gasteiger et al., 2022).
  • Interaction Block: In every block, four parallel message types are computed and aggregated:
    • Atom-to-atom (ΔhiAA\Delta h_i^{\rm AA}): RBF-weighted pair interactions.
    • Edge-to-atom (ha(r,x)(r^)=θa(r^)+∑b∈N(a)Fsphere(xba,r^)hb(r,x)(r^)h_a(r, x)(\hat{\mathbf{r}}) = \theta_a(\hat{\mathbf{r}}) + \sum_{b\in N(a)} F_{\mathrm{sphere}}(x_{ba}, \hat{\mathbf{r}}) h_b(r, x)(\hat{\mathbf{r}})0): Edge features aggregated at atom nodes.
    • Atom-to-edge (ha(r,x)(r^)=θa(r^)+∑b∈N(a)Fsphere(xba,r^)hb(r,x)(r^)h_a(r, x)(\hat{\mathbf{r}}) = \theta_a(\hat{\mathbf{r}}) + \sum_{b\in N(a)} F_{\mathrm{sphere}}(x_{ba}, \hat{\mathbf{r}}) h_b(r, x)(\hat{\mathbf{r}})1): Atom features update directed edges.
    • Edge-to-edge (ha(r,x)(r^)=θa(r^)+∑b∈N(a)Fsphere(xba,r^)hb(r,x)(r^)h_a(r, x)(\hat{\mathbf{r}}) = \theta_a(\hat{\mathbf{r}}) + \sum_{b\in N(a)} F_{\mathrm{sphere}}(x_{ba}, \hat{\mathbf{r}}) h_b(r, x)(\hat{\mathbf{r}})2, quadruplet): Interaction between edges via angular and dihedral contexts, restricted to ha(r,x)(r^)=θa(r^)+∑b∈N(a)Fsphere(xba,r^)hb(r,x)(r^)h_a(r, x)(\hat{\mathbf{r}}) = \theta_a(\hat{\mathbf{r}}) + \sum_{b\in N(a)} F_{\mathrm{sphere}}(x_{ba}, \hat{\mathbf{r}}) h_b(r, x)(\hat{\mathbf{r}})3 nearest neighbors for computational tractability.

After ha(r,x)(r^)=θa(r^)+∑b∈N(a)Fsphere(xba,r^)hb(r,x)(r^)h_a(r, x)(\hat{\mathbf{r}}) = \theta_a(\hat{\mathbf{r}}) + \sum_{b\in N(a)} F_{\mathrm{sphere}}(x_{ba}, \hat{\mathbf{r}}) h_b(r, x)(\hat{\mathbf{r}})4 (typically 4 for base, 6 for large) blocks, an MLP readout aggregates per-atom embeddings into global energies. Forces are calculated as gradients:

ha(r,x)(r^)=θa(r^)+∑b∈N(a)Fsphere(xba,r^)hb(r,x)(r^)h_a(r, x)(\hat{\mathbf{r}}) = \theta_a(\hat{\mathbf{r}}) + \sum_{b\in N(a)} F_{\mathrm{sphere}}(x_{ba}, \hat{\mathbf{r}}) h_b(r, x)(\hat{\mathbf{r}})5

(Gasteiger et al., 2022).

3. Structural and Computational Enhancements

GemNet-OC introduces several innovations to improve efficiency and scalability:

  • k-NN Graph Construction: Supersedes conventional cutoff-based graphs, avoiding fragmenting or over-densification, and stabilizes per-node memory/compute.
  • Simplified Basis Functions: Use of Gaussian RBFs and Legendre-based CBF/SBF increases throughput (≈29%) without significant loss in accuracy. Radial basis size is 6, CBF size 64, and SBF (dihedral) size 32.
  • Restricting Quadruplet Aggregation: Aggregation is limited to the ha(r,x)(r^)=θa(r^)+∑b∈N(a)Fsphere(xba,r^)hb(r,x)(r^)h_a(r, x)(\hat{\mathbf{r}}) = \theta_a(\hat{\mathbf{r}}) + \sum_{b\in N(a)} F_{\mathrm{sphere}}(x_{ba}, \hat{\mathbf{r}}) h_b(r, x)(\hat{\mathbf{r}})6 closest quadruplet neighbors (from original ha(r,x)(r^)=θa(r^)+∑b∈N(a)Fsphere(xba,r^)hb(r,x)(r^)h_a(r, x)(\hat{\mathbf{r}}) = \theta_a(\hat{\mathbf{r}}) + \sum_{b\in N(a)} F_{\mathrm{sphere}}(x_{ba}, \hat{\mathbf{r}}) h_b(r, x)(\hat{\mathbf{r}})7), reducing the edge-to-edge overhead from 330% (GemNet-XL) to ≈31% with negligible accuracy loss.
  • Hierarchical Interactions: Introduces additional atom-atom, atom-edge, and edge-atom passes per block, each incurring ≈10% additional runtime but increasing model expressivity.
  • Optimized Implementation: Efficient bilinear layers, SiLU activations, and standardized parameter initialization ensure numerical and memory stability, making GemNet-OC ~10× faster to converge on OC20 than predecessor models (Gasteiger et al., 2022).

4. Training Procedure and Hyperparameters

The training objective combines energy and force regression:

ha(r,x)(r^)=θa(r^)+∑b∈N(a)Fsphere(xba,r^)hb(r,x)(r^)h_a(r, x)(\hat{\mathbf{r}}) = \theta_a(\hat{\mathbf{r}}) + \sum_{b\in N(a)} F_{\mathrm{sphere}}(x_{ba}, \hat{\mathbf{r}}) h_b(r, x)(\hat{\mathbf{r}})8

with OC20 experiments using ha(r,x)(r^)=θa(r^)+∑b∈N(a)Fsphere(xba,r^)hb(r,x)(r^)h_a(r, x)(\hat{\mathbf{r}}) = \theta_a(\hat{\mathbf{r}}) + \sum_{b\in N(a)} F_{\mathrm{sphere}}(x_{ba}, \hat{\mathbf{r}}) h_b(r, x)(\hat{\mathbf{r}})9, xba=∥rb−ra∥x_{ba} = \|r_b - r_a\|0. Gradient norm is clipped at 10. Optimization uses AdamW (AMSGrad). The table below summarizes key default hyperparameters:

Parameter Base Setting Large Setting
Blocks (xba=∥rb−ra∥x_{ba} = \|r_b - r_a\|1) 4 6
Atom Embedding Dim (xba=∥rb−ra∥x_{ba} = \|r_b - r_a\|2) 512 640
Edge Embedding Dim (xba=∥rb−ra∥x_{ba} = \|r_b - r_a\|3) 512 768
RBF size 6 6
CBF size 64 64
SBF size 32 32
k-NN Neighbors 30 30
Quadruplet Neighbors (xba=∥rb−ra∥x_{ba} = \|r_b - r_a\|4) 8 8
Neighbor Cutoff 12 Ã… 12 Ã…
Initial LR xba=∥rb−ra∥x_{ba} = \|r_b - r_a\|5 xba=∥rb−ra∥x_{ba} = \|r_b - r_a\|6
Batch Size (OC20) 256/16 GPUs –
Activation SiLU SiLU
Optimizer AdamW AdamW

Core choices are the result of extensive controlled ablations (≈16,000 GPU-days) across OC20 and its six subsets, optimizing for performance and stability (Gasteiger et al., 2022).

5. Empirical Performance and Benchmarking

GemNet-OC establishes new state-of-the-art results on the OC20 benchmark with significant computational gains. For the S2EF (Structure to Energy and Force) task, GemNet-OC achieves:

  • Energy MAE: 24.3 meV (–10.7% relative to previous SOTA, GemNet-dT: 27.2 meV)
  • Force MAE: 0.603 meV/Ã… (comparable to GemNet-dT: 0.594 meV/Ã…)
  • Training Time: ~600 GPU-hours to convergence (vs. ~6000 GPU-hours for GemNet-XL)
  • GemNet-OC-Large further reduces Energy MAE to 20.0 meV at a slight force MAE increase (0.687 meV/Ã…).

These improvements are realized across the full OC20 test regime, including in-distribution (ID) and multiple out-of-distribution (OOD) splits (adsorbates, catalysts, both). Direct force prediction matches backpropagation-based approaches in accuracy while enabling up to 4× training speedup (Gasteiger et al., 2021, Gasteiger et al., 2022).

6. Generalization Analysis and Dataset Design

Performance and optimal hyperparameter selection are highly sensitive to dataset characteristics: chemical diversity, system size, overall dataset size, and distribution shift between train and test. OC20 subsets, such as OC-2M (2 million samples, random and diverse), enable fast and faithful prototyping—converging in a fraction of the full dataset's GPU time with strong correlation in model selection outcomes (Kendall's xba=∥rb−ra∥x_{ba} = \|r_b - r_a\|7 vs. OC20). In contrast, small or narrow subsets (OC-Rb, OC-Sn, OC-sub30) can lead to qualitatively different optimal model settings, often failing to transfer to large-scale benchmarks (Gasteiger et al., 2022).

A practical implication is the necessity of representative proxy datasets (e.g., OC-2M) that preserve chemical diversity and domain shift for effective model development.

7. Strengths, Limitations, and Practical Recommendations

Strengths

  • Outperforms previous models on all OC20 tasks (energy and force prediction) and especially excels on non-planar/dynamic adsorbate–catalyst configurations.
  • Achieves 10× increased training speed and matches force accuracy of prior SOTA with reduced resource requirements.
  • Fixed-size k-NN graph and computational simplifications make the model highly scalable.
  • Robust to variations in batch size and system size due to architectural choices.

Limitations

  • Two-hop (quadruplet) message passing has xba=∥rb−ra∥x_{ba} = \|r_b - r_a\|8 complexity, only practical for smaller datasets; OC20 uses streamlined ("GemNet-T") variants.
  • Model architecture is more complex compared to most GNNs, which may increase onboarding cost for new practitioners.
  • Primary empirical validation is for molecular force/energy prediction; transferability to other domains remains untested.

Recommendations

  • Use k-NN graphs with xba=∥rb−ra∥x_{ba} = \|r_b - r_a\|9 to ensure model stability and scalable compute across molecular sizes.
  • Prefer simple Gaussian RBF and Legendre-based angular decompositions for geometric bases.
  • Limit quadruplet aggregation to 8–16 neighbors for computational tractability.
  • Select proxy datasets (e.g., OC-2M) for rapid prototyping, especially in environments where full OC20-scale compute is unavailable.
  • Jointly weight force losses 10–50× higher than energy losses for stable training.
  • Always validate on true OOD splits; in-trajectory test sets can understate real-world error by an order of magnitude (Gasteiger et al., 2022).

References

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GemNet-OC Model.