Forces are not Enough: Benchmark and Critical Evaluation for Machine Learning Force Fields with Molecular Simulations (2210.07237v2)

Published 13 Oct 2022 in physics.comp-ph, cs.LG, and physics.chem-ph

Abstract: Molecular dynamics (MD) simulation techniques are widely used for various natural science applications. Increasingly, ML force field (FF) models begin to replace ab-initio simulations by predicting forces directly from atomic structures. Despite significant progress in this area, such techniques are primarily benchmarked by their force/energy prediction errors, even though the practical use case would be to produce realistic MD trajectories. We aim to fill this gap by introducing a novel benchmark suite for learned MD simulation. We curate representative MD systems, including water, organic molecules, a peptide, and materials, and design evaluation metrics corresponding to the scientific objectives of respective systems. We benchmark a collection of state-of-the-art (SOTA) ML FF models and illustrate, in particular, how the commonly benchmarked force accuracy is not well aligned with relevant simulation metrics. We demonstrate when and how selected SOTA methods fail, along with offering directions for further improvement. Specifically, we identify stability as a key metric for ML models to improve. Our benchmark suite comes with a comprehensive open-source codebase for training and simulation with ML FFs to facilitate future work.

PDF Abstract

Overview of Machine Learning Force Fields in Molecular Simulations

The paper "Forces are not Enough: Benchmark and Critical Evaluation for Machine Learning Force Fields with Molecular Simulations" offers a comprehensive analysis of ML force field models used in molecular dynamics (MD) simulations. The paper critically evaluates the alignment between force prediction accuracy and simulation performance, emphasizing the need for simulation-based metrics to assess the practical usefulness of ML force fields.

Key Insights and Contributions

The authors introduce a novel benchmark suite designed specifically for learned MD simulations. This benchmark curates a diverse set of representative systems including water, organic molecules, peptides, and materials. They emphasize the importance of simulation stability and the recovery of equilibrium statistics and dynamical properties, which are more indicative of a model's capability in real-world MD applications than mere force prediction accuracy.

Some standout points from their findings include:

Force Prediction vs. Simulation Metrics: The paper finds that force prediction accuracy does not necessarily correlate with the quality of the MD trajectories produced. For instance, models like GemNet-dT and DimeNet excel in force prediction but often fail in simulation stability.
Implications for Model Design: The paper suggests that more expressive atomistic representations, such as those that respect E(3) and SE(3) symmetries, can capture interatomic interactions more accurately. Models like NequIP, despite their computational cost, demonstrate superior performance in both accuracy and stability across a variety of systems.
Stability and Training Dataset Size: The research indicates that the inclusion of more training data enhances force prediction but does not guarantee improvement in simulation stability, stressing the importance of stability metrics in model evaluation.

Methodology and Metrics

The paper outlines several physical observables used to evaluate the MD simulations. These include radial distribution functions (RDFs), the distribution of interatomic distances, and diffusivity coefficients. The authors argue that these metrics, which are key to understanding structural and thermodynamic properties, are better evaluators of force field quality than force mean absolute errors (MAE).

Furthermore, the paper addresses the failure modes of current models, particularly concerning stability issues arising from simulations entering high-energy states not captured during training. They suggest possible solutions such as incorporating off-equilibrium geometries in training data and leveraging active learning to improve robustness.

Practical and Theoretical Implications

By highlighting the discrepancies between force accuracy and simulation quality, the research calls for a shift in focus towards simulation-based evaluations to guide the development of ML force fields. The benchmark suite introduced in this paper will potentially facilitate better model comparisons, drive improvements in MD simulations, and foster advancements in automated materials discovery.

Future Directions

The paper lays the groundwork for future research in several directions:

Enhanced Sampling and Differentiable Simulations: Techniques like differentiable simulations could be harnessed to train models directly to reproduce key observables. This approach could yield models that are robust against errors arising from limited training data.
Coarse-Graining and Larger Systems: Investigating ML models for coarse-grained MD simulations could unlock more efficient pathways to model large systems with complex interactions at reduced computational costs.

In summary, this paper provides crucial insights into the limitations of current ML force fields and suggests benchmark protocols and metrics that could accelerate the development of more accurate and reliable MD simulation models. The research paves the way for more nuanced evaluations of force fields in molecular simulations beyond simple force prediction, stressing simulation performance and stability as critical factors.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Xiang Fu (48 papers)
Zhenghao Wu (8 papers)
Wujie Wang (7 papers)
Tian Xie (77 papers)
Sinan Keten (9 papers)
Rafael Gomez-Bombarelli (50 papers)
Tommi Jaakkola (115 papers)

Citations (113)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/xie_tian/status/1789703952126111836

https://twitter.com/marceldotsci/status/1789735993878122808

https://twitter.com/TimothyDuignan/status/1798866539623731688

https://twitter.com/kmnitesh05/status/1789882962172919945

https://twitter.com/CuriousChihuah1/status/1791124211999854686