Machine Learning of coarse-grained Molecular Dynamics Force Fields (1812.01736v3)

Published 4 Dec 2018 in physics.comp-ph, cs.LG, and stat.ML

Abstract: Atomistic or ab-initio molecular dynamics simulations are widely used to predict thermodynamics and kinetics and relate them to molecular structure. A common approach to go beyond the time- and length-scales accessible with such computationally expensive simulations is the definition of coarse-grained molecular models. Existing coarse-graining approaches define an effective interaction potential to match defined properties of high-resolution models or experimental data. In this paper, we reformulate coarse-graining as a supervised machine learning problem. We use statistical learning theory to decompose the coarse-graining error and cross-validation to select and compare the performance of different models. We introduce CGnets, a deep learning approach, that learns coarse-grained free energy functions and can be trained by a force matching scheme. CGnets maintain all physically relevant invariances and allow one to incorporate prior physics knowledge to avoid sampling of unphysical structures. We show that CGnets can capture all-atom explicit-solvent free energy surfaces with models using only a few coarse-grained beads and no solvent, while classical coarse-graining methods fail to capture crucial features of the free energy surface. Thus, CGnets are able to capture multi-body terms that emerge from the dimensionality reduction.

Authors (8)

Jiang Wang (50 papers)
Simon Olsson (15 papers)
Christoph Wehmeyer (4 papers)
Nicholas E. Charron (7 papers)
Cecilia Clementi (30 papers)
Gianni De Fabritiis (39 papers)
Adria Perez (1 paper)
Frank Noe (11 papers)

Citations (374)

View on Semantic Scholar

Summary

The paper introduces a supervised learning framework that recasts coarse-graining into a force-matching task using CGnets.
CGnets employ deep neural networks with built-in physical invariances and regularization via prior energy to accurately capture free energy surfaces.
Empirical validations on systems like alanine dipeptide and Chignolin demonstrate improved simulation efficiency and predictive fidelity.

Machine Learning of Coarse-Grained Molecular Dynamics Force Fields

The paper under consideration demonstrates a novel approach to reformulate molecular dynamics coarse-graining as a supervised machine learning problem. The authors introduce CGnets, deep learning architectures, to learn coarse-grained (CG) free energy functions. This work contributes significantly to expanding the time- and length-scales usable in computationally expensive molecular dynamics by introducing an innovative methodology that leverages machine learning for more accurate CG models.

Main Contributions

Supervised Learning Framework: Coarse-graining traditionally relied on fitting potential energy surfaces to reproduce characteristics of more intricate models. In this paper, the authors reposition this problem into a supervised learning context, where CG models learn through force-matching schemes, integrating statistical learning theory principles to optimize model representation.
Introduction of CGnets: The authors propose CGnets, neural networks that maintain essential physical invariances such as rotational and translational invariance while being able to incorporate prior empirical or theoretical physics knowledge. CGnets offer a more expressive parameterization than classical methods, seamlessly representing the system's free energy surface within a reduced-dimensional space.
Regularization with Prior Energy: Recognizing the challenge of catastrophic errors in regions of configuration space not previously encountered during training, the paper implements a prior energy term in CGnets. This term acts as a regularization scheme to prevent unphysical model behavior, ensuring enhanced generalization and stability during dynamical simulations.
Empirical Validation: The practical efficacy of CGnets is validated on systems such as the coarse-graining of solvated alanine dipeptide and the folding dynamics of Chignolin, a polypeptide. The results highlight the superior performance of CGnets in capturing the intricate free energy landscapes of these biomolecules, contrasting the limitations found in traditional CG methodologies that typically omit multi-body interactions.

Theoretical and Practical Implications

The reformulation of the coarse-graining problem as a machine learning task showcases the potential for supervised learning techniques to transform traditional computational physics methods. This introduction of CGnets represents an advancement in the ability to simulate molecular systems more efficiently while capturing all-atom explicit-solvent free energy surfaces without requiring the computational overhead of all-atom simulations.

Theoretical Implications:

The force-matching reformulation provides a pathway to integrate broader machine learning frameworks into molecular dynamics, facilitating better cross-model adaptability and offering richer descriptions of molecular interactions.
The decomposition of error into Bias, Variance, and Noise components within the CG framework reveals new opportunities in model selection and optimization.

Practical Implications:

CGnets enable the mapping of complex free energy landscapes with greater fidelity, which could lead to more predictive simulations in heterogenous systems or high-throughput applications, aiding in drug design and protein engineering.
By substantially lowering the computational cost while preserving accuracy, CGnets could enhance the applicability of molecular dynamics in industrial and research domains.

Future Directions

This paper opens multiple avenues for future investigation. Developing CGnets further to improve model transferability across different systems presents an exciting challenge with broad implications. One promising approach might extend the featurization methodology to better incorporate configurational environments or chemical specificity, potentially increasing the CGnets' flexibility beyond system-specific parameterizations.

Another direction involves improving the integration of machine learned models with hybrid or multiscale modeling approaches, ensuring that CGnets and traditional fine-grained models can interoperate seamlessly within larger simulation workflows.

Overall, the paper illustrates a compelling intersection of machine learning and physical simulations, showing potential pathways toward more generalized, efficient, and predictive computational molecular science methods.

PDF Markdown