Open Catalyst 2022: Electrocatalyst Benchmark
- OC22 is a comprehensive dataset and benchmark challenge that expands the OC20 initiative to include oxide surfaces and electrochemical reactions for electrocatalyst discovery.
- It incorporates nearly 500k DFT relaxation events with free-energy corrections and rigorous uncertainty quantification for accurate screening of reaction intermediates.
- OC22 sets detailed evaluation tasks (IS2RE, IS2RS, S2EF) and benchmarks advanced ML models, such as graph neural networks, to enhance predictive fidelity under realistic conditions.
Open Catalyst 2022 (OC22) refers to the second major dataset and benchmark challenge in the Open Catalyst Project, extending the OC20 initiative to better support machine learning for catalyst discovery in the context of complex oxide surfaces and electrochemical reactions relevant to energy conversion and storage. While OC20 focused predominantly on metal and alloy surfaces, OC22 incorporates oxide perovskites and transition metal oxides, providing new data modalities and enhanced benchmarking capabilities for electrocatalysts in aqueous and acidic environments. OC22 builds upon the architecture, data curation, and evaluation strategies established by OC20, and is designed to challenge the next generation of graph neural networks (GNNs) and machine learning potentials (MLPs) for generalizability, chemical diversity, and predictive fidelity. Its data, metrics, and computational guidelines enable the community to robustly assess and compare ML approaches for screening catalyst surfaces under experimentally relevant conditions.
1. Motivation and Dataset Scope
OC22 addresses the limitations of previous datasets by expanding coverage from metallic substrates to oxide systems and by integrating relevant free-energy corrections for adsorption reactions. The primary scientific motivation is to accelerate discovery of efficient and robust electrocatalysts for reactions such as the oxygen evolution reaction (OER) and hydrogen evolution reaction (HER), where oxides (including perovskites such as ABO₃ and rutile-type MO₂, with M = Ir, Ru, etc.) play a central role. OC22 comprises approximately half a million DFT relaxation events for key intermediates—*OH, *O, and *OOH—on a chemically diverse set of oxides under acidic conditions. Each data entry includes:
- slab and adsorbate structure (coordinate and cell)
- DFT total energies (ΔE_ads) at 0 K
- post-DFT static free-energy corrections (ΔG_corr) per adsorbate
This dataset is tailored to support the evaluation of ML models in scenarios relevant to modern electrocatalysis, particularly where precise knowledge of surface–intermediate energetics is critical for accurate reaction mechanism predictions (Chatterjee et al., 5 Dec 2025).
2. Benchmark Tasks and Evaluation Metrics
OC22 tasks mirror those of OC20, designed to replicate everyday workflows in computational catalysis:
- Initial Structure to Relaxed Energy (IS2RE): Predicts the final relaxed adsorption energy from an initial (unrelaxed) structure.
- Initial Structure to Relaxed Structure (IS2RS): Predicts equilibrium atomic coordinates after relaxation.
- Structure to Energy and Forces (S2EF): Predicts total adsorption energy and per-atom force vectors given any structure.
For performance comparison and leaderboard submissions, OC22 adopts evaluation metrics consistent with OC20:
- Mean Absolute Error (MAE): For adsorption energies and force components.
- Energies Within Threshold (EwT): Fraction of samples with prediction error within a prescribed tolerance (e.g., |E_pred − E_DFT| < 0.02 eV).
- ADwT / AFbT: Metrics for geometric/force proximity to DFT-relaxed structures.
In addition to these, OC22 introduces and emphasizes thermodynamic overpotential (η) as a system-level metric, due to its centrality in electrocatalyst screening, and systematically catalogs the uncertainty in both the DFT adsorption energies and free energy corrections (Chatterjee et al., 5 Dec 2025).
3. Quantification of Uncertainty in DFT Adsorption Energies
A significant finding in the OC22 effort is a rigorous quantification of uncertainty in DFT reference data. Both parametric DFT variations (functional choice, slab thickness, k-point mesh, coverage, site) and statistical repeats within the dataset lead to the following conservative uncertainty estimates:
- Adsorption energies (u_ads): ~0.3 eV for HER (*H), ~0.4–0.7 eV for OER (*OH, *O, *OOH) intermediates.
- Free energy corrections (u_corr): ~0.1 eV (*H), ~0.2–0.3 eV (*OH, *O, *OOH).
These uncertainties propagate to overall screening metrics, where the combined uncertainty in overpotential (η) can approach 0.5 eV per intermediate and 1 eV in aggregate (Chatterjee et al., 5 Dec 2025). Practically, this means that energy differences of less than 0.3–0.5 eV are not statistically significant within OC22.
4. Thermodynamic Overpotential and Its Screening Implications
OC22 formalizes calculation of thermodynamic overpotentials for both HER and OER:
- HER:
- OER:
where are free energies of sequential intermediates and 1.23 eV represents the thermodynamic minimum at standard conditions. Because of large data-dependent uncertainties, a notable fraction of candidates (up to 47% in OER and 37% in HER) fall within the theoretical “best-in-class” regime simply due to DFT or model error, making overpotential alone insufficient for high-throughput catalyst screening (Chatterjee et al., 5 Dec 2025).
5. Modeling Approaches, Baselines, and State-of-the-Art Performance
OC22 leverages advances in GNNs and equivariant neural architectures that were initially benchmarked on OC20, such as EquiformerV2, GemNet (OC/XL), and direct-force models. These architectures are empirically evaluated for both predictive accuracy and practical resource scaling. Baseline state-of-the-art models achieve on metals and alloys:
- Energy MAE: <0.25 eV (full OC20/OC22)
- Force MAE: <0.025 eV/Å
The introduction of oxide systems and new adsorbates increases the chemical complexity, with current best GNN-based MLIPs remaining challenged by out-of-domain generalization, particularly for OER intermediates and multicomponent oxide surfaces (Gasteiger et al., 2022, Clausen et al., 14 Mar 2024). OC22 highlights the necessity for models that remain robust to system size, composition, and environmental shift.
6. Beyond Thermodynamics: Toward Multi-objective Catalyst Screening
Recognizing that overpotential-based screening is highly susceptible to systematic and statistical errors, OC22 advocates for integration of complementary attributes into future ML-driven workflows:
- Stability: Pourbaix decomposition energies at electrochemical operating conditions.
- Synthesizability: Structural validity and ease of synthesis.
- Lifetime/Degradation: Predictions of dissolution and dynamic/chemical stability.
- Cost/Abundance: Elemental occurrence and price metrics.
- Kinetics: Transition-state barriers, e.g., via CatTSunami on the same or related datasets (Wander et al., 3 May 2024).
This paradigm mandates multi-objective optimization frameworks and trustworthy uncertainty quantification, with integrated metrics guiding experimental prioritization (Chatterjee et al., 5 Dec 2025, Gruich et al., 2023).
7. Impact, Limitations, and Future Directions
OC22 substantially broadens the chemical and methodological scope of large-scale catalyst data resources, setting a new bar for diversity and complexity in surface–adsorbate modeling. However, it foregrounds critical limitations:
- The uncertainty in DFT-level reference energies currently limits the discriminative power of ML-screening pipelines based on overpotential alone.
- Generalization to out-of-domain surfaces and compositions remains a significant open challenge for ML architectures.
- There is a need for improved protocols that balance prediction accuracy, robustness, and physical fidelity, including uncertainty-aware active learning, efficient task-specific architectures, and systematic evaluation across all relevant material properties.
Future OC initiatives are expected to further integrate transition-state and microkinetic benchmarks, exploit uncertainty-calibrated predictions, and encourage data/model fusion across computation, experiment, and theory to advance the practical deployment of accelerated catalyst discovery (Chatterjee et al., 5 Dec 2025, Wander et al., 3 May 2024, Gruich et al., 2023).
References
- "Adsorption energies are necessary but not sufficient to identify good catalysts" (Chatterjee et al., 5 Dec 2025)
- "GemNet-OC: Developing Graph Neural Networks for Large and Diverse Molecular Simulation Datasets" (Gasteiger et al., 2022)
- "Adapting OC20-trained EquiformerV2 Models for High-Entropy Materials" (Clausen et al., 14 Mar 2024)
- "CatTSunami: Accelerating Transition State Energy Calculations with Pre-trained Graph Neural Networks" (Wander et al., 3 May 2024)
- "Clarifying Trust of Materials Property Predictions using Neural Networks with Distribution-Specific Uncertainty Quantification" (Gruich et al., 2023)