Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A unified framework for coarse grained molecular dynamics of proteins with high-fidelity reconstruction (2403.17513v5)

Published 26 Mar 2024 in physics.chem-ph, physics.bio-ph, and q-bio.BM

Abstract: Simulating large proteins using traditional molecular dynamics (MD) is computationally demanding. To address this challenge, we propose a novel tree-structured coarse-grained model that efficiently captures protein dynamics. By leveraging a hierarchical protein representation, our model accurately reconstructs high-resolution protein structures, with sub-angstrom precision achieved for a 168-amino acid protein. We combine this coarse-grained model with a deep learning framework based on stochastic differential equations (SDEs). A neural network is trained to model the drift force, while a RealNVP-based noise generator approximates the stochastic component. This approach enables a significant speedup of over 20,000 times compared to traditional MD, allowing for the generation of microsecond-long trajectories within a few minutes and providing valuable insights into protein behavior. Our method demonstrates high accuracy, achieving sub-angstrom reconstruction for short (25 ns) trajectories and maintaining statistical consistency across multiple independent simulations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. S. A. Hollingsworth and R. O. Dror, “Molecular Dynamics Simulation for All,” Neuron 99, 1129–1143 (2018).
  2. M. I. Zimmerman and G. R. Bowman, Methods in Enzymology, 1st ed., Vol. 578 (Elsevier Inc., 2016) pp. 213–225.
  3. M. Bergdorf, A. Robinson-Mosher, X. Guo, K.-H. Law, D. E. Shaw,  and D. E. Shaw, “Desmond/GPU Performance as of April 2021,”  1 (2021).
  4. D. E. Shaw, P. J. Adams, A. Azaria, J. A. Bank, B. Batson, A. Bell, M. Bergdorf, J. Bhatt, J. Adam Butts, T. Correi, R. M. Dirks, R. O. Dror, M. P. Eastwoo, B. Edwards, A. Even, P. Feldmann, M. Fenn, C. H. Fenton, A. Forte, J. Gagliardo, G. Gill, M. Gorlatova, B. Greskamp, J. P. Grossman, J. Gullingsrud, A. Harper, W. Hasenplaugh, M. Heily, B. C. Heshmat, J. Hunt, D. J. Ierardi, L. Iserovich, B. L. Jackson, N. P. Johnson, M. M. Kirk, J. L. Klepeis, J. S. Kuskin, K. M. Mackenzie, R. J. Mader, R. McGowen, A. McLaughlin, M. A. Moraes, M. H. Nasr, L. J. Nociolo, L. O’Donnell, A. Parker, J. L. Peticolas, G. Pocina, C. Predescu, T. Quan, J. K. Salmon, C. Schwink, K. S. Shim, N. Siddique, J. Spengler, T. Szalay, R. Tabladillo, R. Tartler, A. G. Taube, M. Theobald, B. Towles, W. Vick, S. C. Wang, M. Wazlowski, M. J. Weingarten, J. M. Williams,  and K. A. Yuh, “Anton 3: Twenty Microseconds of Molecular Dynamics Simulation before Lunch,” International Conference for High Performance Computing, Networking, Storage and Analysis, SC , 1–11 (2021).
  5. R. C. Bernardi, M. C. Melo,  and K. Schulten, “Enhanced sampling techniques in molecular dynamics simulations of biological systems,” Biochimica et Biophysica Acta - General Subjects 1850, 872–877 (2015).
  6. W. G. Noid, “Perspective: Coarse-grained models for biomolecular systems,” Journal of Chemical Physics 139 (2013), 10.1063/1.4818908.
  7. A. Liwo, C. Czaplewski, J. Pillardy,  and H. A. Scheraga, “Cumulant-based expressions for the multibody terms for the correlation between local and electrostatic interactions in the united-residue force field,” Journal of Chemical Physics 115, 2323–2347 (2001).
  8. C. Mim, H. Cui, J. A. Gawronski-Salerno, A. Frost, E. Lyman, G. A. Voth,  and V. M. Unger, “Structural basis of membrane bending by the N-BAR protein endophilin,” Cell 149, 137–145 (2012).
  9. J. W. Chu and G. A. Voth, “Allostery of actin filaments: Molecular dynamics simulations and coarse-grained analysis,” Proceedings of the National Academy of Sciences of the United States of America 102, 13111–13116 (2005).
  10. A. Yu, A. J. Pak, P. He, V. Monje-Galvan, L. Casalino, Z. Gaieb, A. C. Dommer, R. E. Amaro,  and G. A. Voth, “A multiscale coarse-grained model of the SARS-CoV-2 virion,” Biophysical Journal 120, 1097–1104 (2021).
  11. G. Tóth, “Effective potentials from complex simulations: a potential-matching algorithm and remarks on coarse-grained potentials,” Journal of Physics: Condensed Matter 19, 335222 (2007).
  12. W. Li and S. Takada, “Characterizing protein energy landscape by self-learning multiscale simulations: Application to a designed β𝛽\betaitalic_β-hairpin,” Biophysical Journal 99, 3029–3037 (2010).
  13. J. Wang, S. Olsson, C. Wehmeyer, A. Pérez, N. E. Charron, G. De Fabritiis, F. Noé,  and C. Clementi, “Machine Learning of Coarse-Grained Molecular Dynamics Force Fields,” ACS Central Science 5, 755–767 (2019), arXiv:1812.01736 .
  14. B. E. Husic, N. E. Charron, D. Lemm, J. Wang, A. Pérez, M. Majewski, A. Krämer, Y. Chen, S. Olsson, G. De Fabritiis, F. Noé,  and C. Clementi, “Coarse graining molecular dynamics with graph neural networks,” Journal of Chemical Physics 153, 1–16 (2020), arXiv:2007.11412 .
  15. L. Zhang, H. Wang,  and E. Weinan, “Reinforced dynamics for enhanced sampling in large atomic and molecular systems,” Journal of Chemical Physics 148 (2018), 10.1063/1.5019675, arXiv:1712.03461 .
  16. D. Wang, Y. Wang, J. Chang, L. Zhang, H. Wang,  and W. E, “Efficient sampling of high-dimensional free energy landscapes using adaptive reinforced dynamics,” Nature Computational Science 2, 20–29 (2022), arXiv:2104.01620 .
  17. G. Xu, Q. Wang,  and J. Ma, “OPUS-Rota4: a gradient-based protein side-chain modeling framework assisted by deep learning-based predictors,” Briefings in Bioinformatics 23, 1–10 (2022).
  18. G. Xu, Z. Luo, R. Zhou, Q. Wang,  and J. Ma, “OPUS-Fold3: a gradient-based protein all-atom folding and docking framework on TensorFlow,” Briefings in Bioinformatics 24, 1–8 (2023).
  19. J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek, A. Potapenko, A. Bridgland, C. Meyer, S. A. Kohl, A. J. Ballard, A. Cowie, B. Romera-Paredes, S. Nikolov, R. Jain, J. Adler, T. Back, S. Petersen, D. Reiman, E. Clancy, M. Zielinski, M. Steinegger, M. Pacholska, T. Berghammer, S. Bodenstein, D. Silver, O. Vinyals, A. W. Senior, K. Kavukcuoglu, P. Kohli,  and D. Hassabis, “Highly accurate protein structure prediction with AlphaFold,” Nature 596, 583–589 (2021).
  20. J. Köhler, Y. Chen, A. Krämer, C. Clementi,  and F. Noé, “Flow-Matching: Efficient Coarse-Graining of Molecular Dynamics without Forces,” Journal of Chemical Theory and Computation 19, 942–952 (2023), arXiv:2203.11167 .
  21. J. Parsons, J. B. Holmes, J. M. Rojas, J. Tsai,  and C. E. M. Strauss, “Practical Conversion from Torsion Space to Cartesian Space for In Silico Protein Synthesis,” Wiley InterScience 0211458 (2005), 10.1002/jcc.20237.
  22. J. Zhu, “Theoretical investigation of the Freeman resonance in the dissociative ionization of H2+,” Physical Review A 103, 013113 (2021).
  23. J. Zhu and A. Scrinzi, “Electron double-emission spectra for helium atoms in intense 400-nm laser pulses,” Physical Review A 101, 063407 (2020).
  24. J. Zhu, “Quantum simulation of dissociative ionization of H 2 + in full dimensionality with a time-dependent surface-flux method,” Physical Review A 102, 053109 (2020).
  25. J. Moult, “A decade of CASP: Progress, bottlenecks and prognosis in protein structure prediction,” Current Opinion in Structural Biology 15, 285–289 (2005).
  26. J. Moult, J. T. Pedersen, R. Judson,  and K. Fidelis, “A large-scale experiment to assess protein structure prediction methods,” Proteins: Structure, Function, and Bioinformatics 23, ii–iv (1995).
  27. D. Van Der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark,  and H. J. Berendsen, “GROMACS: Fast, flexible, and free,” Journal of Computational Chemistry 26, 1701–1718 (2005).
  28. E. Lindahl, B. Hess,  and D. van der Spoel, “GROMACS 3.0: A package for molecular simulation and trajectory analysis,” Journal of Molecular Modeling 7, 306–317 (2001).
  29. H. J. Berendsen, D. van der Spoel,  and R. van Drunen, “GROMACS: A message-passing parallel molecular dynamics implementation,” Computer Physics Communications 91, 43–56 (1995).
  30. C. Zhang and J. Ma, “Enhanced sampling and applications in protein folding in explicit solvent,” The Journal of Chemical Physics 132, 244101 (2010), arXiv:1003.0464 .
  31. T. Zang, L. Yu, C. Zhang,  and J. Ma, “Parallel continuous simulated tempering and its applications in large-scale molecular simulations,” The Journal of Chemical Physics 141, 044113 (2014).
  32. T. Zang, T. Ma, Q. Wang,  and J. Ma, “Improving low-accuracy protein structures using enhanced sampling techniques,” The Journal of chemical physics 149 (2018), 10.1063/1.5027243.
  33. T. Ma, T. Zang, Q. Wang,  and J. Ma, “Refining protein structures using enhanced sampling techniques with restraints derived from an ensemble-based model,” Protein science : a publication of the Protein Society 27, 1842–1849 (2018).
  34. M. K. Scherer, B. Trendelkamp-Schroer, F. Paul, G. Pérez-Hernández, M. Hoffmann, N. Plattner, C. Wehmeyer, J. H. Prinz,  and F. Noé, “PyEMMA 2: A Software Package for Estimation, Validation, and Analysis of Markov Models,” Journal of Chemical Theory and Computation 11, 5525–5542 (2015).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Jinzhen Zhu (7 papers)

Summary

The paper presents a unified framework for coarse grained molecular dynamics (MD) of proteins that achieves high-fidelity reconstruction of all-atom details from a minimal set of internal coordinates. By introducing a tree‐structured representation of protein coordinates, the approach decouples the computationally intensive reconstruction step from the simulation of dynamics in a reduced collective variable (CV) space. The method relies on two key ingredients:

  • Hierarchical Coordinate Transformation:

The framework encodes the protein structure as a hierarchy of local reference frames using bond angles, dihedral angles, and side‐chain information. A recursive transformation operator P^I=P^I1O^I\hat{P}_I = \hat{P}_{I-1}\hat{O}_I maps local coordinates (denoted with a superscript “0”) to the global Cartesian space. This procedure, implemented by a series of translation and rotation operators (e.g., the rotation matrix R^(u^,θ)\hat{R}(\hat{u},\theta)), allows the reconstruction of protein tertiary structure with sub-angstrom accuracy. Notably, it is demonstrated that including both dihedral angles and bond angles—rather than using only dihedrals—yields significant improvements in the reconstruction quality (with deviations below 0.2 angstroms in key metrics).

  • Stochastic Differential Equation Framework Coupled with Deep Neural Networks:

The simulation of protein dynamics is expressed in the reduced CV space through a stochastic differential equation (SDE) of the form dx(τ)dτ=f(x(τ))+αgα(x(τ))ξα,\frac{dx(\tau)}{d\tau} = f(x(\tau)) + \sum_\alpha g_\alpha(x(\tau))\xi_\alpha, where the deterministic drift term, f(x)f(x), is modeled by a deep neural network (DNN) propagator F\mathbb{F}, and the stochastic component is captured by a modified RealNVP-based noise generator G\mathbb{G}. The network is trained using a loss function based on multi-step prediction LTn=1ni=1n1Tit=1TiSt+iFi(St)2,L_{T_n} = \frac{1}{n}\sum_{i=1}^{n}\frac{1}{T-i}\sum_{t=1}^{T-i}\left\Vert \mathbf{S}_{t+i} - \mathbb{F}^i(\mathbf{S}_t)\right\Vert^2, so that for n2n\ge2, the learning process effectively captures the underlying force-field information. As a consequence, the approach is capable of generating trajectories that are statistically consistent with reference all-atom MD simulations while incurring dramatic computational speedups.

Key numerical and algorithmic highlights include:

  • High-Fidelity Reconstruction:

For a 168-amino acid target protein, the complete reconstruction using 1624 CV parameters (812 bond angles and 812 dihedral angles) reproduces side-chain and backbone details with maximum deviations in the CA atom distances of 0.26 angstroms and average errors as small as 0.04 angstroms. Furthermore, comparisons with reconstructions that fix the sp3sp^3 bond angles to theoretical values illustrate that small variations in bond angles are critical for preserving secondary structure elements (an example being the loss of an alpha helix when bond angles are artificially constrained).

  • Accelerated Dynamics:

The simulation of MD trajectories using the drift force propagator alone leads to RMSD profiles that closely match the reference structures. The reported speedup is dramatic: the coarse-grained DNN propagator produces a 25 ns trajectory in 0.59 seconds on a standard GPU versus roughly 20 hours on a 256-CPU cluster, corresponding to a speedup factor of over 70,000 times. When the noise generator is coupled with the drift component, the method is able to generate microsecond-scale trajectories within minutes—a speedup of over 20,000 times relative to conventional MD sampling.

  • Advanced Noise Modeling with RealNVP:

The incorporation of a simplified RealNVP-based noise generator allows the method to account for the stochastic terms in the SDE. By mapping the residual errors (i.e., the difference between the projected next step from the drift force and the true CV values) onto a Gaussian space, the noise model provides a statistically accurate account of fluctuations. This design choice streamlines the network architecture by omitting additive transformation terms and directly learning the standard deviation of the noise distribution.

  • Convergence and Force-Field Learning:

A systematic convergence paper demonstrates that while one-step propagation may suffice for short trajectories, a multi-step loss function with n2n\ge2 is crucial to correctly capture the physical drift force. This is supported both by numerical RMSD error analyses and by an illustrative derivation showing that the learned drift function converges towards the true force field.

  • Parallelizable Reconstruction:

Although the recursive transformation from CVs to full Cartesian coordinates is computationally intensive, this step is only needed during trajectory analysis rather than during real-time simulation. The authors note that the reconstruction step can be parallelized effectively across CPUs/GPUs, thus mitigating its impact on overall performance.

Overall, the work offers a rigorous approach that combines reduced collective variable dynamics, machine learning-based integration of deterministic and stochastic components, and efficient all-atom reconstruction. The method not only accelerates molecular dynamics simulations for large proteins by orders of magnitude but also maintains the high structural fidelity needed for predicting biological behavior and potential applications in drug design and protein engineering. The emphasis on both statistical consistency (via RMSD and angular distribution comparisons) and the detailed preservation of key structural features underscores the method’s potential as a complementary or even standalone tool in computational biophysics.