Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Information Bottleneck Approach for Markov Model Construction (2404.02856v2)

Published 3 Apr 2024 in physics.bio-ph

Abstract: Markov state models (MSMs) are valuable for studying dynamics of protein conformational changes via statistical analysis of molecular dynamics (MD) simulations. In MSMs, the complex configuration space is coarse-grained into conformational states, with the dynamics modeled by a series of Markovian transitions among these states at discrete lag times. Constructing the Markovian model at a specific lag time requires state defined without significant internal energy barriers, enabling internal dynamics relaxation within the lag time. This process coarse grains time and space, integrating out rapid motions within metastable states. This work introduces a continuous embedding approach for molecular conformations using the state predictive information bottleneck (SPIB), which unifies dimensionality reduction and state space partitioning via a continuous, machine learned basis set. Without explicit optimization of VAMP-based scores, SPIB demonstrates state-of-the-art performance in identifying slow dynamical processes and constructing predictive multi-resolution Markovian models. When applied to mini-proteins trajectories, SPIB showcases unique advantages compared to competing methods. It automatically adjusts the number of metastable states based on a specified minimal time resolution, eliminating the need for manual tuning. While maintaining efficacy in dynamical properties, SPIB excels in accurately distinguishing metastable states and capturing numerous well-populated macrostates. Furthermore, SPIB's ability to learn a low-dimensional continuous embedding of the underlying MSMs enhances the interpretation of dynamic pathways. Accordingly, we propose SPIB as an easy-to-implement methodology for end-to-end MSM construction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. D. E. Shaw, P. J. Adams, A. Azaria, J. A. Bank, B. Batson, A. Bell, M. Bergdorf, J. Bhatt, J. A. Butts, T. Correia, R. M. Dirks, R. O. Dror, M. P. Eastwood, B. Edwards, A. Even, P. Feldmann, M. Fenn, C. H. Fenton, A. Forte, J. Gagliardo, G. Gill, M. Gorlatova, B. Greskamp, J. Grossman, J. Gullingsrud, A. Harper, W. Hasenplaugh, M. Heily, B. C. Heshmat, J. Hunt, D. J. Ierardi, L. Iserovich, B. L. Jackson, N. P. Johnson, M. M. Kirk, J. L. Klepeis, J. S. Kuskin, K. M. Mackenzie, R. J. Mader, R. McGowen, A. McLaughlin, M. A. Moraes, M. H. Nasr, L. J. Nociolo, L. O’Donnell, A. Parker, J. L. Peticolas, G. Pocina, C. Predescu, T. Quan, J. K. Salmon, C. Schwink, K. S. Shim, N. Siddique, J. Spengler, T. Szalay, R. Tabladillo, R. Tartler, A. G. Taube, M. Theobald, B. Towles, W. Vick, S. C. Wang, M. Wazlowski, M. J. Weingarten, J. M. Williams,  and K. A. Yuh, “Anton 3: twenty microseconds of molecular dynamics simulation before lunch,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’21 (Association for Computing Machinery, New York, NY, USA, 2021).
  2. D. Frenkel and B. Smit, Understanding Molecular Simulation, 2nd ed. (Academic Press, Inc., USA, 2001).
  3. K. A. Konovalov, I. C. Unarta, S. Cao, E. C. Goonetilleke,  and X. Huang, “Markov state models to study the functional dynamics of proteins in the wake of machine learning,” JACS Au 1, 1330–1341 (2021).
  4. Y. Wang, J. M. L. Ribeiro,  and P. Tiwary, “Machine learning approaches for analyzing and enhancing molecular dynamics simulations,” Current opinion in structural biology 61, 139–145 (2020).
  5. W. C. Swope, J. W. Pitera,  and F. Suits, “Describing protein folding kinetics by molecular dynamics simulations. 1. theory,” The Journal of Physical Chemistry B 108, 6571–6581 (2004).
  6. F. Noé, I. Horenko, C. Schütte,  and J. C. Smith, “Hierarchical analysis of conformational dynamics in biomolecules: Transition networks of metastable states,” Journal of Chemical Physics 126, 155102 (2007).
  7. J. D. Chodera, N. Singhal, V. S. Pande, K. A. Dill,  and W. C. Swope, “Automatic discovery of metastable states for the construction of markov models of macromolecular conformational dynamics,” The Journal of Chemical Physics 126, 155101 (2007), https://doi.org/10.1063/1.2714538 .
  8. J.-H. Prinz, H. Wu, M. Sarich, B. Keller, M. Senne, M. Held, J. D. Chodera, C. Schütte,  and F. Noé, “Markov models of molecular kinetics: Generation and validation,” The Journal of chemical physics 134 (2011).
  9. W. Wang, S. Cao, L. Zhu,  and X. Huang, “Constructing markov state models to elucidate the functional conformational changes of complex biomolecules,” Wiley Interdisciplinary Reviews: Computational Molecular Science 8, e1343 (2018).
  10. A. Mardt, L. Pasquali, H. Wu,  and F. Noé, “Vampnets for deep learning of molecular kinetics,” Nature Communications 9, 5 (2018).
  11. D. Nagel, S. Sartore,  and G. Stock, “Selecting features for markov modeling: A case study on hp35,” Journal of Chemical Theory and Computation 19, 3391–3405 (2023a), pMID: 37167425, https://doi.org/10.1021/acs.jctc.3c00240 .
  12. A. K.-H. Yik, Y. Qiu, I. C. Unarta, S. Cao,  and X. Huang, “A step-by-step guide on how to construct quasi-markov state models to study functional conformational changes of biological macromolecules,” in A Practical Guide to Recent Advances in Multiscale Modeling and Simulation of Biomolecules (AIP Publishing LLC Melville, New York, 2023) pp. 10–1.
  13. C. Kolloff and S. Olsson, “Machine learning in molecular dynamics simulations of biomolecular systems,” in Comprehensive Computational Chemistry (First Edition), edited by M. Yáñez and R. J. Boyd (Elsevier, Oxford, 2024) first edition ed., pp. 475–492.
  14. I. C. Unarta, S. Cao, S. Kubo, W. Wang, P. P.-H. Cheung, X. Gao, S. Takada,  and X. Huang, “Role of bacterial rna polymerase gate opening dynamics in dna loading and antibiotics inhibition elucidated by quasi-markov state model,” Proceedings of the National Academy of Sciences 118, e2024324118 (2021).
  15. F. K. Sheong, D.-A. Silva, L. Meng, Y. Zhao,  and X. Huang, “Automatic state partitioning for multibody systems (apm): An efficient algorithm for constructing markov state models to elucidate conformational dynamics of multibody systems,” Journal of chemical theory and computation 11, 17–27 (2015).
  16. H. Gu, W. Wang, S. Cao, I. C. Unarta, Y. Yao, F. K. Sheong,  and X. Huang, “Rpnet: a reverse-projection-based neural network for coarse-graining metastable conformational states for protein dynamics,” Physical Chemistry Chemical Physics 24, 1462–1474 (2022).
  17. F. Noé and F. Nuske, “A variational approach to modeling slow processes in stochastic dynamical systems,” Multiscale Modeling & Simulation 11, 635–655 (2013).
  18. W. Chen, H. Sidky,  and A. L. Ferguson, “Nonlinear discovery of slow molecular modes using state-free reversible VAMPnets,” The Journal of Chemical Physics 150, 214114 (2019), https://pubs.aip.org/aip/jcp/article-pdf/doi/10.1063/1.5092521/15559948/214114_1_online.pdf .
  19. L. Bonati, G. Piccini,  and M. Parrinello, “Deep learning the slow modes for rare events sampling,” Proceedings of the National Academy of Sciences 118, e2113533118 (2021).
  20. K. Shmilovich and A. L. Ferguson, “Girsanov reweighting enhanced sampling technique (grest): On-the-fly data-driven discovery of and enhanced sampling in slow collective variables,” The Journal of Physical Chemistry A  (2023).
  21. C. Wehmeyer and F. Noé, “Time-lagged autoencoders: Deep learning of slow collective variables for molecular kinetics,” The Journal of chemical physics 148 (2018).
  22. C. X. Hernández, H. K. Wayment-Steele, M. M. Sultan, B. E. Husic,  and V. S. Pande, “Variational encoding of complex dynamics,” Physical Review E 97, 062412 (2018).
  23. D. Wang and P. Tiwary, “State predictive information bottleneck,” The Journal of Chemical Physics 154, 134111 (2021), https://pubs.aip.org/aip/jcp/article-pdf/doi/10.1063/5.0038198/15587033/134111_1_online.pdf .
  24. S. Mehdi, D. Wang, S. Pant,  and P. Tiwary, “Accelerating all-atom simulations and gaining mechanistic understanding of biophysical systems through state predictive information bottleneck,” Journal of Chemical Theory and Computation 18, 3231–3238 (2022).
  25. E. R. Beyerle, S. Mehdi,  and P. Tiwary, “Quantifying energetic and entropic pathways in molecular systems,” The Journal of Physical Chemistry B 126, 3950–3960 (2022), pMID: 35605180, https://doi.org/10.1021/acs.jpcb.2c01782 .
  26. Z. Zou, E. R. Beyerle, S.-T. Tsai,  and P. Tiwary, “Driving and characterizing nucleation of urea and glycine polymorphs in water,” Proceedings of the National Academy of Sciences 120, e2216099120 (2023).
  27. R. Wang, S. Mehdi, Z. Zou,  and P. Tiwary, “Is the local ion density sufficient to drive NaCl nucleation in vacuum and in water?” arXiv e-prints , arXiv:2309.09284 (2023), arXiv:2309.09284 [physics.chem-ph] .
  28. B. P. Vani, A. Aranganathan, D. Wang,  and P. Tiwary, “Alphafold2-rave: From sequence to boltzmann ranking,” Journal of Chemical Theory and Computation  (2023).
  29. B. P. Vani, A. Aranganathan,  and P. Tiwary, “Exploring kinase asp-phe-gly (dfg) loop conformational stability with alphafold2-rave,” Journal of Chemical Information and Modeling  (2023).
  30. E. R. Beyerle and P. Tiwary, “Thermodynamically optimized machine-learned reaction coordinates for hydrophobic ligand dissociation,” The Journal of Physical Chemistry B 0, null (0), pMID: 38205806, https://doi.org/10.1021/acs.jpcb.3c08304 .
  31. K. Lindorff-Larsen, S. Piana, R. O. Dror,  and D. E. Shaw, “How fast-folding proteins fold,” Science 334, 517–520 (2011).
  32. S. Piana, K. Lindorff-Larsen,  and D. E. Shaw, “Protein folding kinetics and thermodynamics from atomistic simulation,” Proceedings of the National Academy of Sciences 109, 17845–17850 (2012).
  33. P. Deuflhard and M. Weber, “Robust Perron cluster analysis in conformation dynamics,” Linear Algebra and Its Applications 398, 161–184 (2005).
  34. A. Jain and G. Stock, “Identifying metastable states of folding proteins,” Journal of chemical theory and computation 8, 3810–3819 (2012).
  35. F. Noé and C. Clementi, “Kinetic distance and kinetic maps from molecular dynamics simulation,” Journal of chemical theory and computation 11, 5002–5011 (2015).
  36. S. Cao, Y. Qiu, M. L. Kalin,  and X. Huang, “Integrative generalized master equation: A method to study long-timescale biomolecular dynamics via the integrals of memory kernels,” The Journal of Chemical Physics 159 (2023).
  37. Y. Qiu, M. S. O’Connor, M. Xue, B. Liu,  and X. Huang, “An efficient path classification algorithm based on variational autoencoder to identify metastable path channels for complex conformational changes,” Journal of Chemical Theory and Computation 19, 4728–4742 (2023).
  38. B. Liu, Y. Qiu, E. C. Goonetilleke,  and X. Huang, “Kinetic network models to study molecular self-assembly in the wake of machine learning,” MRS Bulletin 47, 958–966 (2022).
  39. B. Liu, M. Xue, Y. Qiu, K. A. Konovalov, M. S. O’Connor,  and X. Huang, “GraphVAMPnets for uncovering slow collective variables of self-assembly dynamics,” The Journal of Chemical Physics 159, 094901 (2023), https://pubs.aip.org/aip/jcp/article-pdf/doi/10.1063/5.0158903/18105131/094901_1_5.0158903.pdf .
  40. B. E. Husic and V. S. Pande, “Markov state models: From an art to a science,” Journal of the American Chemical Society 140, 2386–2396 (2018).
  41. V. A. Voelz, G. R. Bowman, K. Beauchamp,  and V. S. Pande, “Molecular simulation of ab initio protein folding for a millisecond folder ntl9 (1- 39),” Journal of the American Chemical Society 132, 1526–1528 (2010).
  42. W. C. Swope, J. W. Pitera, F. Suits, M. Pitman, M. Eleftheriou, B. G. Fitch, R. S. Germain, A. Rayshubski, T. C. Ward, Y. Zhestkov, et al., “Describing protein folding kinetics by molecular dynamics simulations. 2. example applications to alanine dipeptide and a β𝛽\betaitalic_β-hairpin peptide,” The Journal of Physical Chemistry B 108, 6582–6594 (2004).
  43. M. K. Scherer, B. Trendelkamp-Schroer, F. Paul, G. Pérez-Hernández, M. Hoffmann, N. Plattner, C. Wehmeyer, J. H. Prinz,  and F. Noé, “PyEMMA 2: A Software Package for Estimation, Validation, and Analysis of Markov Models,” Journal of Chemical Theory and Computation 11, 5525–5542 (2015).
  44. M. P. Harrigan, M. M. Sultan, C. X. Hernández, B. E. Husic, P. Eastman, C. R. Schwantes, K. A. Beauchamp, R. T. McGibbon,  and V. S. Pande, “Msmbuilder: Statistical models for biomolecular dynamics,” Biophysical Journal 112, 10 – 15 (2017).
  45. M. Hoffmann, M. Scherer, T. Hempel, A. Mardt, B. de Silva, B. E. Husic, S. Klus, H. Wu, N. Kutz, S. L. Brunton,  and F. Noé, “Deeptime: a python library for machine learning dynamical models from time series data,” Machine Learning: Science and Technology 3, 015009 (2021).
  46. S. Cao, A. Montoya-Castillo, W. Wang, T. E. Markland,  and X. Huang, “On the advantages of exploiting memory in markov state models for biomolecular dynamics,” The Journal of Chemical Physics 153 (2020).
  47. A. J. Dominic III, S. Cao, A. Montoya-Castillo,  and X. Huang, “Memory unlocks the future of biomolecular dynamics: Transformative tools to uncover physical insights accurately and efficiently,” Journal of the American Chemical Society 145, 9916–9927 (2023).
  48. H. Wu, F. Nüske, F. Paul, S. Klus, P. Koltai,  and F. Noé, “Variational koopman models: Slow collective variables and molecular kinetics from short off-equilibrium simulations,” The Journal of chemical physics 146 (2017).
  49. H. Wu and F. Noé, “Variational approach for learning markov processes from time series data,” Journal of Nonlinear Science 30, 23–66 (2020).
  50. F. Nuske, B. G. Keller, G. Pérez-Hernández, A. S. Mey,  and F. Noé, “Variational approach to molecular kinetics,” Journal of chemical theory and computation 10, 1739–1752 (2014).
  51. A. A. Alemi, I. Fischer, J. V. Dillon,  and K. Murphy, “Deep variational information bottleneck,” 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings , 1–19 (2017), arXiv:1612.00410 .
  52. Y. Wang, J. M. L. Ribeiro,  and P. Tiwary, “Past–future information bottleneck for sampling molecular reaction coordinate simultaneously with thermodynamics and kinetics,” Nat. Commun. 10, 3573 (2019).
  53. J. M. Tomczak and M. Welling, “VAE with a vampprior,” International Conference on Artificial Intelligence and Statistics, AISTATS 2018 , 1214–1223 (2018), arXiv:1705.07120 .
  54. D. Wang, R. Zhao, J. D. Weeks,  and P. Tiwary, “Influence of long-range forces on the transition states and dynamics of nacl ion-pair dissociation in water,” The Journal of Physical Chemistry B 126, 545–551 (2022).
  55. D. Nagel, S. Sartore,  and G. Stock, “Toward a benchmark for markov state models: The folding of hp35,” The Journal of Physical Chemistry Letters 14, 6956–6967 (2023b), pMID: 37504674, https://doi.org/10.1021/acs.jpclett.3c01561 .
  56. B. E. Husic, R. T. McGibbon, M. M. Sultan,  and V. S. Pande, “Optimized parameter selection reveals trends in markov state models for protein folding,” The Journal of Chemical Physics 145, 194103 (2016), https://doi.org/10.1063/1.4967809 .
  57. R. T. McGibbon and V. S. Pande, “Variational cross-validation of slow dynamical modes in molecular kinetics,” The Journal of chemical physics 142 (2015).
  58. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot,  and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research 12, 2825–2830 (2011).
  59. H. Sidky, W. Chen,  and A. L. Ferguson, “High-resolution markov state models for the dynamics of trp-cage miniprotein constructed over slow folding modes identified by state-free reversible vampnets,” The Journal of Physical Chemistry B 123, 7999–8009 (2019), pMID: 31453697, https://doi.org/10.1021/acs.jpcb.9b05578 .
  60. F. Noé, C. Schütte, E. Vanden-Eijnden, L. Reich,  and T. R. Weikl, “Constructing the equilibrium ensemble of folding pathways from short off-equilibrium simulations,” Proceedings of the National Academy of Sciences 106, 19011–19016 (2009).
  61. T. R. Weikl, “Transition states in protein folding kinetics: Modeling ϕitalic-ϕ\phiitalic_ϕ-values of small β𝛽\betaitalic_β-sheet proteins,” Biophysical journal 94, 929–937 (2008).
  62. S. a Beccara, T. Škrbić, R. Covino,  and P. Faccioli, “Dominant folding pathways of a ww domain,” Proceedings of the National Academy of Sciences 109, 2330–2335 (2012).
  63. T. J. Lane, G. R. Bowman, K. Beauchamp, V. A. Voelz,  and V. S. Pande, “Markov state model reveals folding and functional dynamics in ultra-long md trajectories,” Journal of the American Chemical Society 133, 18413–18419 (2011).
  64. X. Huang, Y. Yao, G. R. Bowman, J. Sun, L. J. Guibas, G. Carlsson,  and V. S. Pande, “Constructing multi-resolution markov state models (msms) to elucidate rna hairpin folding mechanisms,” in Biocomputing 2010 (World Scientific, 2010) pp. 228–239.
  65. Y. Yao, R. Z. Cui, G. R. Bowman, D.-A. Silva, J. Sun,  and X. Huang, “Hierarchical nyström methods for constructing markov state models for conformational dynamics,” The Journal of chemical physics 138 (2013).
  66. D. Nagel, “Moldyn freiburg – clustering,” https://github.com/moldyn/Clustering (2021), [Online; accessed 27-November-2023].

Summary

  • The paper introduces SPIB, a novel framework combining information bottleneck and deep learning to construct accurate Markov state models (MSMs) for molecular systems by simultaneously performing dimensionality reduction and state partitioning.
  • SPIB automatically learns dynamical propagators and resolves metastable states based on a lag time parameter, allowing adaptive coarse-graining without needing intermediate transformations like tICA or manual state definition.
  • Validated on mini-proteins, SPIB demonstrates state-of-the-art performance in capturing slow dynamics and building kinetically coherent models, holding promise for applications in drug discovery and material science.

An Information Bottleneck Approach for Markov Model Construction: A Deep Dive into the SPIB Framework

The paper under discussion introduces a novel approach to constructing Markov state models (MSMs) focused on the dynamics of molecular systems, particularly protein folding. The method integrates an information bottleneck framework with machine learning to simultaneously achieve dimensionality reduction and robust state partitioning, ultimately leading to the formation of highly accurate MSMs. This work is pivotal in elucidating the connection between coarse-grained state dynamics and high-throughput molecular simulations, a cornerstone of computational chemistry and biophysics.

Overview of MSMs and SPIB

MSMs are crucial for the quantitative description of molecular simulations, offering insight into the dynamics by modeling transitions across discretized states of a system. Traditional workflow for MSM construction includes featurization, dimension reduction, clustering, and generation of transition matrices. However, these steps involve significant methodological choices, each impacting the accuracy of the resulting model.

In the context of this complex process, the state predictive information bottleneck (SPIB) framework presents a streamlined alternative. Unlike existing approaches that heavily rely on optimizing variational scores like those employed by VAMPnet, SPIB introduces a lag time parameter to adaptively resolve metastable states based on dynamic modeling needs. This approach enables automatic coarse-graining, where the number of metastable states is dynamically learned, depending on the desired temporal resolution.

Methodological Insights

SPIB employs a continuous embedding strategy, leveraging deep neural networks to understand molecular trajectories' intrinsic slow modes without partitioning them a priori. The framework is sophisticated, combining techniques from variational inference with a conceptually simple yet effective heuristic for quantifying metastability. This enables SPIB to directly learn dynamical propagators from data, circumventing the necessity for intermediate transformations, such as the use of time-lagged independent component analysis (tICA) or principal component analysis (PCA).

Through rigorous cross-validation and a set of well-defined quantitative metrics—such as GMRQ score, metastability, and Shannon entropy—SPIB demonstrates state-of-the-art performance in modeling processes with pronounced slow dynamics. Notably, it excels in creating models that capture a diverse set of well-populated states, balancing the need to capture both structural transitions and kinetic coherency.

Applications and Implications

The paper illustrates SPIB's capabilities using simulated datasets of three mini-proteins: Trp-cage, HP35, and WW-domain. These systems serve as benchmarks due to their distinct folding pathways and well-characterized energy landscapes. The results reinforce the potential of SPIB to revolutionize multi-resolution MSM construction, demonstrating high accuracy in model validation against traditional processes and newly introduced data from advanced molecular dynamics simulations.

In practical terms, the SPIB approach advances the field by promoting data-driven, nuanced understanding of biomolecular processes, bridging the gap between high-resolution simulations and their conversion into actionable kinetic models. Importantly, the method's capacity to adaptively adjust the number of metastable states in a dynamic system significantly reduces manual intervention, streamlining the MSM construction pipeline.

Future Directions

The adaptability and robustness of SPIB hint at its extensive applicability beyond protein folding, potentially aiding in drug discovery and material science, where understanding molecular interactions within complex systems is critical. Further exploration into cross-disciplinary applications will likely deepen its impact. Moreover, refining neural architectures and exploring different regularization schemes could potentially further enhance the SPIB's modeling capabilities and efficiency.

In sum, this paper provides a compelling framework that challenges the conventional MSM construction processes, offering a more integrated, efficient methodology for understanding complex molecular kinetics. With SPIB, the authors contribute not only to theoretical advances in the modeling of dynamic systems but also to the practical toolkit for computational researchers delving deeper into biomolecular dynamics.