Graph-Coarsening for Machine Learning Coarse-grained Molecular Dynamics (2507.16531v1)
Abstract: Coarse-grained (CG) molecular dynamics (MD) simulations can simulate large molecular complexes over extended timescales by reducing degrees of freedom. A critical step in CG modeling is the selection of the CG mapping algorithm, which directly influences both accuracy and interpretability of the model. Despite progress, the optimal strategy for coarse-graining remains a challenging task, highlighting the necessity for a comprehensive theoretical framework. In this work, we present a graph-based coarsening approach to develop CG models. Coarse-grained sites are obtained through edge contractions, where nodes are merged based on a local variational cost metric while preserving key spectral properties of the original graph. Furthermore, we illustrate how Message Passing Atomic Cluster Expansion (MACE) can be applied to generate ML-CG potentials that are not only highly efficient but also accurate. Our approach provides a bottom-up, theoretically grounded computational method for the development of systematically improvable CG potentials.
Summary
- The paper introduces a graph-theoretic coarsening framework that preserves spectral properties and enables a deterministic, interpretable mapping for coarse-grained models.
- The paper validates the methodology using the MACE architecture and force-matching loss on diverse molecules, accurately reproducing bond, angle, and dihedral features.
- The paper demonstrates significant computational efficiency over neural network approaches, offering a reproducible and parameter-free alternative for CG potential construction.
Graph-Coarsening for Machine Learning Coarse-grained Molecular Dynamics
This work introduces a theoretically principled, unsupervised graph-based coarsening framework for constructing coarse-grained (CG) molecular dynamics (MD) models, with a focus on integrating spectral graph theory and machine learning force fields. The approach addresses the persistent challenge of defining optimal CG mappings, which directly impact the accuracy, interpretability, and transferability of CG models. The authors propose a multi-level graph coarsening algorithm based on local variation cost, and demonstrate its integration with the Message Passing Atomic Cluster Expansion (MACE) architecture to generate efficient and accurate ML-CG potentials.
Theoretical Framework and Methodology
The central contribution is a graph-theoretic coarsening protocol that systematically reduces molecular graphs by contracting nodes (atoms) into supernodes (CG beads) using a local variational cost metric. The coarsening process is designed to preserve key spectral properties of the original molecular graph, ensuring that essential topological and chemical information is retained in the CG representation. Two candidate contraction strategies are explored: Local Variation Neighborhood (LVN) and Local Variation Cliques (LVC), which respectively aggregate one-hop neighborhoods and maximal cliques, with the latter particularly suited for preserving ring structures in cyclic molecules.
The mapping from atomistic to CG coordinates is formalized via a coarsening matrix P, which is constructed through a multi-level reduction scheme. The algorithm greedily selects contraction sets that minimize the local spectral distortion, subject to a global spectral similarity constraint. This results in a deterministic, interpretable, and computationally efficient mapping, in contrast to deep learning-based coarsening methods that require training and are non-deterministic.
Once the CG mapping is defined, the MACE architecture is trained using a force-matching loss, where the CG forces are projected from atomistic reference data. The MACE model, which is equivariant and based on the Atomic Cluster Expansion, is optimized to reproduce the projected instantaneous forces, enabling the construction of many-body CG potentials without reliance on predefined energy terms.
Numerical Results and Model Validation
The methodology is validated on three molecular systems of increasing complexity: Aspirin, Azobenzene, and 3-(benzyloxy)pyridin-2-amine (3BPA), using configurations and forces from the MD17 benchmark dataset. The coarsening ratio and contraction strategy are tailored to each system, with LVN used for Aspirin and LVC for Azobenzene and 3BPA to preserve relevant chemical motifs.
Key findings include:
- Bond Length and Angle Distributions: The ML-CG models accurately reproduce bond length and angle distributions observed in the CG-mapped ground truth data. For example, in Aspirin, the C1–carboxylate and C1–ester bead distances, as well as the angle between carboxylate and ester groups, are matched with high fidelity.
- Radial Distribution Functions (RDFs): The RDFs computed from CG simulations closely align with those from the mapped atomistic data, as quantified by low Jensen–Shannon Divergence (JSD) values. This indicates that the structural correlations of the original system are preserved in the CG model.
- Dihedral Angle Distributions: For Azobenzene, the N=N bond and C1–N distances, as well as the C1–N–N–C1 dihedral angle, are accurately captured, demonstrating the model's ability to represent complex conformational landscapes.
- Computational Efficiency: The entire coarsening process is CPU-bound and does not require GPU resources or hyperparameter tuning, offering a significant reduction in computational overhead compared to learned coarsening approaches.
A notable claim is that the unsupervised, deterministic nature of the coarsening algorithm provides interpretability and reproducibility advantages over deep learning-based mapping schemes, which are often opaque and non-deterministic.
Applicability, Limitations, and Implications
The proposed framework is applicable to a broad class of organic molecules, including those with diverse functional groups and ring systems. The use of spectral graph theory ensures that the essential chemical topology is preserved, and the integration with MACE enables the construction of systematically improvable, many-body CG potentials.
However, the method does not leverage neural network-based graph condensation or learned coarsening schemes, which may limit adaptability in cases where data-driven mapping could be advantageous. The LVN strategy can over-aggregate in cyclic systems, but this is mitigated by the LVC approach, which preserves ring integrity by contracting maximal cliques.
The deterministic, parameter-free nature of the coarsening algorithm is both a strength and a limitation: while it ensures reproducibility and low computational cost, it may not capture subtle, system-specific mapping nuances that could be learned from data. The authors acknowledge the need for further ablation studies on larger biomolecular systems, which are currently limited by the conformational space of available training data.
Future Directions
The integration of graph-theoretic coarsening with equivariant ML force fields represents a promising direction for scalable, interpretable CG modeling. Potential future developments include:
- Extension to larger biomolecular systems, such as proteins and nucleic acids, where hierarchical and multi-resolution CG representations are critical.
- Hybrid approaches that combine deterministic graph-based coarsening with data-driven refinement, potentially leveraging semi-supervised or active learning strategies.
- Incorporation of experimental observables in the mapping and training process to enhance transferability and physical realism.
- Exploration of alternative spectral preservation criteria and contraction strategies to further optimize the trade-off between accuracy and computational efficiency.
Conclusion
This work establishes a rigorous, unsupervised graph-coarsening framework for ML-based CG molecular dynamics, demonstrating that spectral graph theory can be effectively combined with state-of-the-art equivariant neural architectures to produce accurate, efficient, and interpretable CG models. The approach offers a viable alternative to heuristic or deep learning-based mapping schemes, with strong numerical evidence for its ability to preserve structural and thermodynamic properties across multiple molecular systems. The implications for scalable, automated CG model construction are significant, and the methodology provides a foundation for further advances in multiscale molecular simulation and ML-driven materials modeling.
Follow-up Questions
- How does the graph coarsening framework ensure the preservation of key spectral properties during node contraction?
- What are the benefits of using an unsupervised and deterministic approach compared to deep learning-based coarsening methods?
- In what ways does the integration with the MACE architecture enhance the prediction accuracy of the coarse-grained models?
- What limitations exist in the current methodology when applying it to larger or more complex biomolecular systems?
- Find recent papers about coarse-grained molecular dynamics.
Related Papers
- Flow-matching -- efficient coarse-graining of molecular dynamics without forces (2022)
- Simulate Time-integrated Coarse-grained Molecular Dynamics with Multi-Scale Graph Networks (2022)
- Machine Learning of coarse-grained Molecular Dynamics Force Fields (2018)
- Developing Machine-Learned Potentials for Coarse-Grained Molecular Simulations: Challenges and Pitfalls (2022)
- Coarse-Graining with Equivariant Neural Networks: A Path Towards Accurate and Data-Efficient Models (2023)
- Learning data efficient coarse-grained molecular dynamics from forces and noise (2024)
- A novel machine learning enabled hybrid optimization framework for efficient and transferable coarse-graining of a model polymer (2022)
- Adversarial-Residual-Coarse-Graining: Applying machine learning theory to systematic molecular coarse-graining (2019)
- Refining Coarse-Grained Molecular Topologies: A Bayesian Optimization Approach (2025)
- Universally applicable and tunable graph-based coarse-graining for Machine learning force fields (2025)