Open Catalyst Project: OC20 & OC22
- Open Catalyst Project is an international initiative that uses large-scale machine learning and deep graph neural networks to benchmark catalyst–adsorbate interactions.
- It features OC20 for metallic catalysts and OC22 for oxide electrocatalysts, offering extensive DFT trajectories and rigorous task metrics.
- The project drives advances in GNN architectures, transfer learning, and high-throughput screening, setting new standards in computational catalysis.
The Open Catalyst Project (OCP) is an international initiative designed to accelerate the computational discovery of heterogeneous electrocatalysts through large-scale ML. The project’s benchmarks—Open Catalyst 2020 (OC20) and Open Catalyst 2022 (OC22)—pioneered high-throughput datasets and tasks that enable direct benchmarking of ML potentials for catalyst–adsorbate systems across previously inaccessible chemical spaces. Together, OC20 and OC22 have established a de facto standard for the design, evaluation, and deployment of deep graph neural network (GNN) models in atomistic materials science and catalysis, stimulating advances in both methodologies and applications.
1. Dataset Construction and Scope
OC20
OC20 (Chanussot et al., 2020, Zitnick et al., 2020) consists of 1,281,040 DFT relaxation trajectories (over 260 million single-point calculations) covering 82 molecular adsorbates on 11,451 metallic slabs drawn from the Materials Project. Structures span unary, binary, and ternary alloys across 55 elements, with slabs cut along low-Miller-index facets and atomistic coverage limited to up to two non-frozen top layers. Adsorbates include H, O, C, N, hydroxyls, CO, CHₓ, and other catalytically relevant fragments. Augmented data comprise random displacements (“rattled”), short-time ab initio molecular dynamics, and electronic structure analyses.
OC22
OC22 (Tran et al., 2022) systematically extends coverage to oxide electrocatalysts, addressing critical gaps for the oxygen evolution reaction (OER) by including 62,331 DFT relaxations (≈9.85 million single points) on 4,728 unary/binary oxides. It encompasses all slab terminations up to Miller index 3, with random placement of nine key OER intermediates (e.g., *O, *OH, *OOH) and full atom relaxation. The computational set-up adopts PBE+U, spin polarization, and standardized convergence parameters.
Both datasets define comprehensive benchmarking standards for in-domain and out-of-domain splits based on adsorbate/catalyst identity and include granular metadata for each structure (geometry, energies, forces, electronic structure).
2. Benchmark Tasks, Mathematical Formalism, and Metrics
OC20 Tasks
- S2EF (Structure → Energy & Forces): Predict the DFT total energy and forces for arbitrary configurations, evaluated by energy and force MAE, cosine similarity, and the EFwT metric (fraction of samples with both eV and max eV/Å).
- IS2RS (Initial Structure → Relaxed Structure): Predict final atom coordinates after relaxation; core metrics are ADwT (average % of atoms within threshold distance) and AFbT (fraction within DFT-force threshold).
- IS2RE (Initial Structure → Relaxed Energy): Predict the final relaxed energy from the initial geometry; metrics include energy MAE and EwT (fraction within 0.02 eV).
OC22 Enhancements
The OC22 framework introduces total-energy regression tasks appropriate for more complex oxide surfaces:
- RS2EF (Relaxed Structure → Energy & Forces)
- RIS2RE (Relaxed Initial Structure → Relaxed Energy)
- IS2RS (Initial Structure → Relaxed Structure)
Loss functions combine weighted per-sample energy and force errors, with variations in / treatment depending on architecture (Tran et al., 2022). Adsorption energies are defined as .
Splits are determined by catalyst/adsorbate identity to rigorously assess extrapolative generalization.
3. Model Architectures and Methodological Advances
Representation and Message Passing
All leading benchmarks leverage GNNs built on symmetry-preserving representations:
- Node features: Atomic number, group, period, electronegativity, covalent radius, valence electron count, atomic volume (Zitnick et al., 2020).
- Edge features: Interatomic distances () embedded by Gaussian or Bessel RBFs, angles () by Legendre polynomials or spherical harmonics.
- Periodicity: Handled by cell-tiling for neighbor construction.
Key models:
- SchNet: Continuous-filter convolution, analytic force head.
- DimeNet++: Directional message passing, explicit angle embeddings (Chanussot et al., 2020).
- GemNet-OC: Multi-level (pair, triplet, quadruplet) message passing, fixed neighbor graphs (typically ), hierarchical geometric bases (Gaussian RBF, Legendre SBF), and direct as well as gradient-derived force heads (Gasteiger et al., 2022).
- EquiformerV2: Equivariant transformer, attention via higher-degree () spherical harmonics with eSCN (efficient SO(3)/SO(2) convolutions), separable S activations, and advanced normalization; achieves new SOTA with lower force/energy MAE (Liao et al., 2023).
- SpinConv: Rotation-invariant, enriched by Voronoi-tessellation–derived edge solid angles and local volumes (Korovin et al., 2022).
Models are evaluated with and without energy-conserving (gradient-based) force prediction, with direct-force variants generally outperforming in catalysis (Kolluru et al., 2022).
Training Protocols & Transfer
Transfer learning and joint-training across OC20/OC22 are critical. Fine-tuning GemNet-OC on OC22 yields a 36% improvement (energy MAE) for oxides, and a 19% improvement for metals when training on both (Tran et al., 2022). EquiformerV2, trained on OC22 alone, outperforms joint-trained GemNet-OC, demonstrating strong data efficiency (Liao et al., 2023).
Lightweight variants (e.g., GemNet-Mini, MPGNN-Tiny) establish that model size can be reduced by 10–30× (to 3M or M parameters) without catastrophic loss in accuracy, enabling democratization of OC20/OC22 modeling (Geitner, 5 Apr 2024).
4. Applications, Performance, and Large-Scale Screening
OC20/OC22-trained models support direct regression of adsorption energies, structure relaxation, and kinetic descriptor evaluation with orders-of-magnitude acceleration over DFT:
- Direct ML screening: Models like OC22/GemNet-OC enable prediction of relaxed adsorption energies for 6 million oxide slab+intermediate pairs at %%%%1718%%%%× speedup compared to brute-force DFT (Tran et al., 2023).
- OER Candidate Discovery: Bulk and nanoscale Pourbaix stability, Wulff-shape surface selection, and OER overpotential (computed from ML-predicted ) facilitate down-selection to tens of experimentally relevant oxide compositions, a process infeasible with conventional computation (Tran et al., 2023, Chatterjee et al., 5 Dec 2025).
Quantitative Benchmarks
| Model | Data | Force MAE (meV/Å) | Energy MAE (meV) | OC22-only Val. |
|---|---|---|---|---|
| GemNet-OC | OC22 | 39.6 | 29.4 | (Tran et al., 2022) |
| GemNet-OC | OC20+22 | 34.2 | 26.9 | (Tran et al., 2022) |
| EquiformerV2 200M | OC22 | 30.7 | 22.9 | (Liao et al., 2023) |
- On OC20 S2EF, EquiformerV2 (153M) attains 14.2 meV/Å (force) and 15.0 meV (energy) MAE, improving force MAE by 9% and energy MAE by 4% over the previous SOTA (Liao et al., 2023).
- On high-entropy alloys, fine-tuning OC20 models with a few hundred new DFT slabs delivers sub-0.04 eV force MAE and >10× acceleration relative to DFT (Clausen et al., 14 Mar 2024).
5. Physical Interpretation, Uncertainty, and Explainability
- Adsorption energy prediction: ML models can recover qualitative trends for catalytic activity, but prediction uncertainty (0.3–0.5 eV for single-site ) rivals the energy range separating good from poor catalysts, limiting overpotential’s utility as a primary screening metric (Chatterjee et al., 5 Dec 2025).
- Explainability: Post-hoc XAI and symbolic regression on OC20 reveal that adsorbate electronegativity, atom count, and local coordination dominate adsorption energy predictions. Symbolic regression recovers scaling laws akin to classical physics equations, confirming that ML models can rediscover interpretable, physically relevant descriptors (Vinchurkar et al., 30 May 2024).
- Disconnected architectures: Models omitting explicit adsorbate–catalyst geometry can achieve reasonable mean absolute errors (MAE up to ~30% higher), suggesting that chemical identity and global structural features encode significant energetic contributions (Carbonero et al., 2023).
OC20/OC22 facilitate the development of principled uncertainty quantification strategies, aiding prioritization in experimental campaigns (Chatterjee et al., 5 Dec 2025).
6. Limitations, Challenges, and Evolving Directions
Key current and ongoing challenges include:
- Underperformance on domain-shifted chemistries: Notably nonmetal/halide-rich slabs and N/O-adsorbates exhibit 2× higher MAE compared to majority-case (metallic, C-H adsorbates) (Kolluru et al., 2022).
- Insufficient selectivity from overpotential alone: Owing to the large uncertainty and physical variance in adsorption energies, overpotential screening yields high false-positive rates; Pourbaix stability, synthesis likelihood, and cost must be integrated into workflows (Chatterjee et al., 5 Dec 2025, Tran et al., 2023).
- Handling long-range interactions: OC22 highlights the limited ability of local-message-passing models to capture extended electrostatics and magnetism in oxide surfaces (Tran et al., 2022).
- Scaling with chemical and system diversity: Architectural advances in GemNet-OC and EquiformerV2 (e.g., multi-level message-passing, higher L, normalization) are crucial for maintaining efficiency and generalization as datasets grow (Gasteiger et al., 2022, Liao et al., 2023).
Future directions include explicit incorporation of charge-equilibration, Ewald-sum or FMM-based long-range interactions, explicit magnetic moment features, and integration of multi-fidelity data (tight-binding, high-accuracy DFT).
7. Ecosystem and Toolkits
The Open Catalyst codebase (https://github.com/Open-Catalyst-Project/ocp) and the Open MatSci ML Toolkit (Miret et al., 2022) provide ready pipelines for OC20/OC22 data loading, management, and training of advanced equivariant GNNs (e.g., EGNN, MegNet, Gala). These frameworks support scalable and reproducible experimentation across diverse hardware (CPUs/GPUs), rapid prototyping (with devsets), and deployment in high-throughput automated catalyst discovery workflows.
For foundational datasets, methodologies, challenges, and limitations, see (Chanussot et al., 2020, Zitnick et al., 2020, Tran et al., 2022, Gasteiger et al., 2022, Liao et al., 2023, Clausen et al., 14 Mar 2024, Tran et al., 2023, Vinchurkar et al., 30 May 2024, Chatterjee et al., 5 Dec 2025).