A Universal Transformer-Based Coarse-Grained Molecular Dynamics Framework for Protein Dynamics (2502.05909v1)
Abstract: We present a novel, universal, Transformer-based coarse-grained molecular dynamics (CG-MD) framework for simulating protein dynamics. Our trained model generalizes to all protein systems, regardless of sequence length or number of chains. First, we extend a tree-structured protein representation to accommodate multi-chain proteins, demonstrating sub-angstrom-level accuracy in reconstructing a 169-amino-acid protein structure. Then, representing collective variables as language-like sequences, we use a Transformer network as a propagator for stochastic differential equations, generating MD trajectories over 10,000 times faster than all-atom MD simulations. This single trained model accurately simulates both single-chain and two-chain proteins, and the generated trajectories closely resemble all-atom MD trajectories in their RMSD profiles. With sufficient training data, we anticipate that our model can achieve universality across all proteins, offering a ~10,000x acceleration of MD simulations with high accuracy.