Graph-Coarsening for Machine Learning Coarse-grained Molecular Dynamics (2507.16531v1)

Published 22 Jul 2025 in cond-mat.soft

Abstract: Coarse-grained (CG) molecular dynamics (MD) simulations can simulate large molecular complexes over extended timescales by reducing degrees of freedom. A critical step in CG modeling is the selection of the CG mapping algorithm, which directly influences both accuracy and interpretability of the model. Despite progress, the optimal strategy for coarse-graining remains a challenging task, highlighting the necessity for a comprehensive theoretical framework. In this work, we present a graph-based coarsening approach to develop CG models. Coarse-grained sites are obtained through edge contractions, where nodes are merged based on a local variational cost metric while preserving key spectral properties of the original graph. Furthermore, we illustrate how Message Passing Atomic Cluster Expansion (MACE) can be applied to generate ML-CG potentials that are not only highly efficient but also accurate. Our approach provides a bottom-up, theoretically grounded computational method for the development of systematically improvable CG potentials.

Summary

The paper introduces a deterministic, graph-theoretic approach that reformulates coarse-graining as a spectral graph coarsening problem.
It integrates the Message Passing Atomic Cluster Expansion (MACE) framework to accurately parameterize coarse-grained force fields from atomistic simulations.
Numerical benchmarks on small organic molecules demonstrate high fidelity in reproducing structural and thermodynamic properties.

Graph-Coarsening for Machine Learning Coarse-grained Molecular Dynamics: A Technical Overview

This work introduces a graph-theoretic, unsupervised, and deterministic approach to coarse-graining (CG) molecular systems for ML-based molecular dynamics (MD) simulations. The methodology leverages multilevel graph coarsening with spectral guarantees to define CG mapping operators, and integrates this with the Message Passing Atomic Cluster Expansion (MACE) framework to parameterize CG force fields. The approach is benchmarked on small organic molecules, demonstrating high fidelity in reproducing structural and thermodynamic properties at reduced resolution.

Theoretical Framework and Methodology

The central challenge in CG MD is the selection of a mapping operator that reduces the system's degrees of freedom while preserving essential physical and chemical properties. Traditional mapping protocols are often heuristic, lack transferability, and are not theoretically grounded. This work addresses these limitations by formulating the CG mapping as a graph coarsening problem, where atoms are nodes and chemical bonds or spatial proximity define edges.

Graph Coarsening Algorithm

Graph Construction: The molecular system is represented as a weighted graph $G_0 = (V_0, E_0, W_0)$ , where $V_0$ are heavy atoms, $E_0$ are edges based on a distance cutoff, and $W_0$ encodes edge weights as exponential functions of interatomic distances.
Multilevel Coarsening: The graph is recursively reduced via edge contractions, forming supernodes (CG beads) by merging candidate sets of atoms. Two strategies are used for candidate selection:
- Local Variation Neighborhood (LVN): One-hop neighborhoods of each node.
- Local Variation Cliques (LVC): Maximal cliques, particularly effective for preserving ring structures.
Spectral Preservation: At each coarsening level, the algorithm minimizes a local variation cost that quantifies the spectral distortion induced by contraction, ensuring that the low-frequency eigenstructure of the Laplacian is preserved.
Mapping Operator: The final mapping matrix $P$ is a product of per-level contraction matrices, mapping atomistic coordinates and forces to CG beads via centroid and force averaging.

ML-CG Force Field Parameterization

Force Matching: The MACE architecture is trained using a force-matching loss, minimizing the discrepancy between CG-mapped atomistic forces and those predicted by the CG model.
Equivariance: The use of MACE ensures that the learned potential is equivariant to translations and rotations, a critical property for molecular systems.
Training Regime: The model is trained on configurations and forces from atomistic MD trajectories, mapped to the CG representation using the derived $P$ .

Numerical Results and Validation

The methodology is validated on three molecular systems of increasing complexity: Aspirin, Azobenzene, and 3-(benzyloxy)pyridin-2-amine (3BPA), using the MD17 dataset.

Aspirin: The LVN-based coarsening yields a five-bead CG model. The MACE-CG potential accurately reproduces bond length distributions, angular distributions, and radial distribution functions (RDFs) compared to the CG-mapped ground truth. Jensen-Shannon Divergence (JSD) values for RDFs are low, indicating high structural fidelity.
Azobenzene: The LVC-based coarsening preserves the aromatic ring topology. The CG model captures the N=N bond length, C1-N distances, and dihedral angle distributions with high accuracy. The RDFs and JSD metrics confirm the preservation of equilibrium properties.
3BPA: The LVC approach is used to coarsen the molecule, with each ring mapped to three beads and the -NH2 group to a single bead. The CG model reproduces C-O and C7-N bond length distributions and RDFs, with close agreement to the mapped atomistic data.

Across all systems, the CG models parameterized via MACE exhibit strong agreement with reference data, both in terms of local (bond/angle) and global (RDF) structural metrics. The approach is fully deterministic, requires no hyperparameter tuning for the mapping, and is computationally efficient (CPU-bound).

Implications, Limitations, and Future Directions

The proposed graph-based coarsening framework provides a systematic, interpretable, and scalable solution to the CG mapping problem in molecular simulation. By eschewing heuristic or learned mapping protocols in favor of spectral graph theory, the method ensures reproducibility and transferability across molecular systems. The integration with MACE further enables the construction of accurate, many-body CG potentials without reliance on predefined energy terms.

Key implications and limitations:

Determinism and Interpretability: The mapping is fully deterministic and interpretable, in contrast to deep learning-based coarsening schemes that are stochastic and opaque.
Computational Efficiency: The coarsening process is CPU-bound and does not require GPU resources or training, making it suitable for large-scale or high-throughput applications.
Limitations: The method does not leverage data-driven or adaptive coarsening, potentially limiting its flexibility for highly heterogeneous or non-standard systems. On cyclic graphs, LVN may over-aggregate, but LVC mitigates this by targeting cliques.
Scalability: While demonstrated on small molecules, the approach is theoretically extensible to larger biomolecules and macromolecular assemblies, though the combinatorial complexity of clique detection may become a bottleneck.

Future directions include the extension of this framework to proteins and large biomolecular complexes, integration with adaptive or hybrid coarsening strategies, and exploration of learned coarsening operators that retain interpretability and spectral guarantees. The deterministic, graph-theoretic approach to CG mapping is likely to influence the development of transferable, automated CG workflows in computational chemistry and materials science.

Conclusion

This work establishes a principled, graph-theoretic foundation for CG mapping in ML-based MD, demonstrating that unsupervised, spectral-preserving coarsening can yield CG models with high structural and thermodynamic fidelity. The combination with equivariant ML potentials such as MACE enables accurate, efficient, and interpretable CG simulations, with broad applicability to molecular modeling and simulation. The approach addresses longstanding challenges in CG mapping and paves the way for systematic, automated CG model construction in complex molecular systems.

PDF Markdown

Follow-up Questions

Related Papers

Authors (5)

Tweets

https://twitter.com/TarakChem/status/1948650838227894490