An Information Bottleneck Approach for Markov Model Construction (2404.02856v2)
Abstract: Markov state models (MSMs) are valuable for studying dynamics of protein conformational changes via statistical analysis of molecular dynamics (MD) simulations. In MSMs, the complex configuration space is coarse-grained into conformational states, with the dynamics modeled by a series of Markovian transitions among these states at discrete lag times. Constructing the Markovian model at a specific lag time requires state defined without significant internal energy barriers, enabling internal dynamics relaxation within the lag time. This process coarse grains time and space, integrating out rapid motions within metastable states. This work introduces a continuous embedding approach for molecular conformations using the state predictive information bottleneck (SPIB), which unifies dimensionality reduction and state space partitioning via a continuous, machine learned basis set. Without explicit optimization of VAMP-based scores, SPIB demonstrates state-of-the-art performance in identifying slow dynamical processes and constructing predictive multi-resolution Markovian models. When applied to mini-proteins trajectories, SPIB showcases unique advantages compared to competing methods. It automatically adjusts the number of metastable states based on a specified minimal time resolution, eliminating the need for manual tuning. While maintaining efficacy in dynamical properties, SPIB excels in accurately distinguishing metastable states and capturing numerous well-populated macrostates. Furthermore, SPIB's ability to learn a low-dimensional continuous embedding of the underlying MSMs enhances the interpretation of dynamic pathways. Accordingly, we propose SPIB as an easy-to-implement methodology for end-to-end MSM construction.
- D. E. Shaw, P. J. Adams, A. Azaria, J. A. Bank, B. Batson, A. Bell, M. Bergdorf, J. Bhatt, J. A. Butts, T. Correia, R. M. Dirks, R. O. Dror, M. P. Eastwood, B. Edwards, A. Even, P. Feldmann, M. Fenn, C. H. Fenton, A. Forte, J. Gagliardo, G. Gill, M. Gorlatova, B. Greskamp, J. Grossman, J. Gullingsrud, A. Harper, W. Hasenplaugh, M. Heily, B. C. Heshmat, J. Hunt, D. J. Ierardi, L. Iserovich, B. L. Jackson, N. P. Johnson, M. M. Kirk, J. L. Klepeis, J. S. Kuskin, K. M. Mackenzie, R. J. Mader, R. McGowen, A. McLaughlin, M. A. Moraes, M. H. Nasr, L. J. Nociolo, L. O’Donnell, A. Parker, J. L. Peticolas, G. Pocina, C. Predescu, T. Quan, J. K. Salmon, C. Schwink, K. S. Shim, N. Siddique, J. Spengler, T. Szalay, R. Tabladillo, R. Tartler, A. G. Taube, M. Theobald, B. Towles, W. Vick, S. C. Wang, M. Wazlowski, M. J. Weingarten, J. M. Williams, and K. A. Yuh, “Anton 3: twenty microseconds of molecular dynamics simulation before lunch,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’21 (Association for Computing Machinery, New York, NY, USA, 2021).
- D. Frenkel and B. Smit, Understanding Molecular Simulation, 2nd ed. (Academic Press, Inc., USA, 2001).
- K. A. Konovalov, I. C. Unarta, S. Cao, E. C. Goonetilleke, and X. Huang, “Markov state models to study the functional dynamics of proteins in the wake of machine learning,” JACS Au 1, 1330–1341 (2021).
- Y. Wang, J. M. L. Ribeiro, and P. Tiwary, “Machine learning approaches for analyzing and enhancing molecular dynamics simulations,” Current opinion in structural biology 61, 139–145 (2020).
- W. C. Swope, J. W. Pitera, and F. Suits, “Describing protein folding kinetics by molecular dynamics simulations. 1. theory,” The Journal of Physical Chemistry B 108, 6571–6581 (2004).
- F. Noé, I. Horenko, C. Schütte, and J. C. Smith, “Hierarchical analysis of conformational dynamics in biomolecules: Transition networks of metastable states,” Journal of Chemical Physics 126, 155102 (2007).
- J. D. Chodera, N. Singhal, V. S. Pande, K. A. Dill, and W. C. Swope, “Automatic discovery of metastable states for the construction of markov models of macromolecular conformational dynamics,” The Journal of Chemical Physics 126, 155101 (2007), https://doi.org/10.1063/1.2714538 .
- J.-H. Prinz, H. Wu, M. Sarich, B. Keller, M. Senne, M. Held, J. D. Chodera, C. Schütte, and F. Noé, “Markov models of molecular kinetics: Generation and validation,” The Journal of chemical physics 134 (2011).
- W. Wang, S. Cao, L. Zhu, and X. Huang, “Constructing markov state models to elucidate the functional conformational changes of complex biomolecules,” Wiley Interdisciplinary Reviews: Computational Molecular Science 8, e1343 (2018).
- A. Mardt, L. Pasquali, H. Wu, and F. Noé, “Vampnets for deep learning of molecular kinetics,” Nature Communications 9, 5 (2018).
- D. Nagel, S. Sartore, and G. Stock, “Selecting features for markov modeling: A case study on hp35,” Journal of Chemical Theory and Computation 19, 3391–3405 (2023a), pMID: 37167425, https://doi.org/10.1021/acs.jctc.3c00240 .
- A. K.-H. Yik, Y. Qiu, I. C. Unarta, S. Cao, and X. Huang, “A step-by-step guide on how to construct quasi-markov state models to study functional conformational changes of biological macromolecules,” in A Practical Guide to Recent Advances in Multiscale Modeling and Simulation of Biomolecules (AIP Publishing LLC Melville, New York, 2023) pp. 10–1.
- C. Kolloff and S. Olsson, “Machine learning in molecular dynamics simulations of biomolecular systems,” in Comprehensive Computational Chemistry (First Edition), edited by M. Yáñez and R. J. Boyd (Elsevier, Oxford, 2024) first edition ed., pp. 475–492.
- I. C. Unarta, S. Cao, S. Kubo, W. Wang, P. P.-H. Cheung, X. Gao, S. Takada, and X. Huang, “Role of bacterial rna polymerase gate opening dynamics in dna loading and antibiotics inhibition elucidated by quasi-markov state model,” Proceedings of the National Academy of Sciences 118, e2024324118 (2021).
- F. K. Sheong, D.-A. Silva, L. Meng, Y. Zhao, and X. Huang, “Automatic state partitioning for multibody systems (apm): An efficient algorithm for constructing markov state models to elucidate conformational dynamics of multibody systems,” Journal of chemical theory and computation 11, 17–27 (2015).
- H. Gu, W. Wang, S. Cao, I. C. Unarta, Y. Yao, F. K. Sheong, and X. Huang, “Rpnet: a reverse-projection-based neural network for coarse-graining metastable conformational states for protein dynamics,” Physical Chemistry Chemical Physics 24, 1462–1474 (2022).
- F. Noé and F. Nuske, “A variational approach to modeling slow processes in stochastic dynamical systems,” Multiscale Modeling & Simulation 11, 635–655 (2013).
- W. Chen, H. Sidky, and A. L. Ferguson, “Nonlinear discovery of slow molecular modes using state-free reversible VAMPnets,” The Journal of Chemical Physics 150, 214114 (2019), https://pubs.aip.org/aip/jcp/article-pdf/doi/10.1063/1.5092521/15559948/214114_1_online.pdf .
- L. Bonati, G. Piccini, and M. Parrinello, “Deep learning the slow modes for rare events sampling,” Proceedings of the National Academy of Sciences 118, e2113533118 (2021).
- K. Shmilovich and A. L. Ferguson, “Girsanov reweighting enhanced sampling technique (grest): On-the-fly data-driven discovery of and enhanced sampling in slow collective variables,” The Journal of Physical Chemistry A (2023).
- C. Wehmeyer and F. Noé, “Time-lagged autoencoders: Deep learning of slow collective variables for molecular kinetics,” The Journal of chemical physics 148 (2018).
- C. X. Hernández, H. K. Wayment-Steele, M. M. Sultan, B. E. Husic, and V. S. Pande, “Variational encoding of complex dynamics,” Physical Review E 97, 062412 (2018).
- D. Wang and P. Tiwary, “State predictive information bottleneck,” The Journal of Chemical Physics 154, 134111 (2021), https://pubs.aip.org/aip/jcp/article-pdf/doi/10.1063/5.0038198/15587033/134111_1_online.pdf .
- S. Mehdi, D. Wang, S. Pant, and P. Tiwary, “Accelerating all-atom simulations and gaining mechanistic understanding of biophysical systems through state predictive information bottleneck,” Journal of Chemical Theory and Computation 18, 3231–3238 (2022).
- E. R. Beyerle, S. Mehdi, and P. Tiwary, “Quantifying energetic and entropic pathways in molecular systems,” The Journal of Physical Chemistry B 126, 3950–3960 (2022), pMID: 35605180, https://doi.org/10.1021/acs.jpcb.2c01782 .
- Z. Zou, E. R. Beyerle, S.-T. Tsai, and P. Tiwary, “Driving and characterizing nucleation of urea and glycine polymorphs in water,” Proceedings of the National Academy of Sciences 120, e2216099120 (2023).
- R. Wang, S. Mehdi, Z. Zou, and P. Tiwary, “Is the local ion density sufficient to drive NaCl nucleation in vacuum and in water?” arXiv e-prints , arXiv:2309.09284 (2023), arXiv:2309.09284 [physics.chem-ph] .
- B. P. Vani, A. Aranganathan, D. Wang, and P. Tiwary, “Alphafold2-rave: From sequence to boltzmann ranking,” Journal of Chemical Theory and Computation (2023).
- B. P. Vani, A. Aranganathan, and P. Tiwary, “Exploring kinase asp-phe-gly (dfg) loop conformational stability with alphafold2-rave,” Journal of Chemical Information and Modeling (2023).
- E. R. Beyerle and P. Tiwary, “Thermodynamically optimized machine-learned reaction coordinates for hydrophobic ligand dissociation,” The Journal of Physical Chemistry B 0, null (0), pMID: 38205806, https://doi.org/10.1021/acs.jpcb.3c08304 .
- K. Lindorff-Larsen, S. Piana, R. O. Dror, and D. E. Shaw, “How fast-folding proteins fold,” Science 334, 517–520 (2011).
- S. Piana, K. Lindorff-Larsen, and D. E. Shaw, “Protein folding kinetics and thermodynamics from atomistic simulation,” Proceedings of the National Academy of Sciences 109, 17845–17850 (2012).
- P. Deuflhard and M. Weber, “Robust Perron cluster analysis in conformation dynamics,” Linear Algebra and Its Applications 398, 161–184 (2005).
- A. Jain and G. Stock, “Identifying metastable states of folding proteins,” Journal of chemical theory and computation 8, 3810–3819 (2012).
- F. Noé and C. Clementi, “Kinetic distance and kinetic maps from molecular dynamics simulation,” Journal of chemical theory and computation 11, 5002–5011 (2015).
- S. Cao, Y. Qiu, M. L. Kalin, and X. Huang, “Integrative generalized master equation: A method to study long-timescale biomolecular dynamics via the integrals of memory kernels,” The Journal of Chemical Physics 159 (2023).
- Y. Qiu, M. S. O’Connor, M. Xue, B. Liu, and X. Huang, “An efficient path classification algorithm based on variational autoencoder to identify metastable path channels for complex conformational changes,” Journal of Chemical Theory and Computation 19, 4728–4742 (2023).
- B. Liu, Y. Qiu, E. C. Goonetilleke, and X. Huang, “Kinetic network models to study molecular self-assembly in the wake of machine learning,” MRS Bulletin 47, 958–966 (2022).
- B. Liu, M. Xue, Y. Qiu, K. A. Konovalov, M. S. O’Connor, and X. Huang, “GraphVAMPnets for uncovering slow collective variables of self-assembly dynamics,” The Journal of Chemical Physics 159, 094901 (2023), https://pubs.aip.org/aip/jcp/article-pdf/doi/10.1063/5.0158903/18105131/094901_1_5.0158903.pdf .
- B. E. Husic and V. S. Pande, “Markov state models: From an art to a science,” Journal of the American Chemical Society 140, 2386–2396 (2018).
- V. A. Voelz, G. R. Bowman, K. Beauchamp, and V. S. Pande, “Molecular simulation of ab initio protein folding for a millisecond folder ntl9 (1- 39),” Journal of the American Chemical Society 132, 1526–1528 (2010).
- W. C. Swope, J. W. Pitera, F. Suits, M. Pitman, M. Eleftheriou, B. G. Fitch, R. S. Germain, A. Rayshubski, T. C. Ward, Y. Zhestkov, et al., “Describing protein folding kinetics by molecular dynamics simulations. 2. example applications to alanine dipeptide and a β𝛽\betaitalic_β-hairpin peptide,” The Journal of Physical Chemistry B 108, 6582–6594 (2004).
- M. K. Scherer, B. Trendelkamp-Schroer, F. Paul, G. Pérez-Hernández, M. Hoffmann, N. Plattner, C. Wehmeyer, J. H. Prinz, and F. Noé, “PyEMMA 2: A Software Package for Estimation, Validation, and Analysis of Markov Models,” Journal of Chemical Theory and Computation 11, 5525–5542 (2015).
- M. P. Harrigan, M. M. Sultan, C. X. Hernández, B. E. Husic, P. Eastman, C. R. Schwantes, K. A. Beauchamp, R. T. McGibbon, and V. S. Pande, “Msmbuilder: Statistical models for biomolecular dynamics,” Biophysical Journal 112, 10 – 15 (2017).
- M. Hoffmann, M. Scherer, T. Hempel, A. Mardt, B. de Silva, B. E. Husic, S. Klus, H. Wu, N. Kutz, S. L. Brunton, and F. Noé, “Deeptime: a python library for machine learning dynamical models from time series data,” Machine Learning: Science and Technology 3, 015009 (2021).
- S. Cao, A. Montoya-Castillo, W. Wang, T. E. Markland, and X. Huang, “On the advantages of exploiting memory in markov state models for biomolecular dynamics,” The Journal of Chemical Physics 153 (2020).
- A. J. Dominic III, S. Cao, A. Montoya-Castillo, and X. Huang, “Memory unlocks the future of biomolecular dynamics: Transformative tools to uncover physical insights accurately and efficiently,” Journal of the American Chemical Society 145, 9916–9927 (2023).
- H. Wu, F. Nüske, F. Paul, S. Klus, P. Koltai, and F. Noé, “Variational koopman models: Slow collective variables and molecular kinetics from short off-equilibrium simulations,” The Journal of chemical physics 146 (2017).
- H. Wu and F. Noé, “Variational approach for learning markov processes from time series data,” Journal of Nonlinear Science 30, 23–66 (2020).
- F. Nuske, B. G. Keller, G. Pérez-Hernández, A. S. Mey, and F. Noé, “Variational approach to molecular kinetics,” Journal of chemical theory and computation 10, 1739–1752 (2014).
- A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings , 1–19 (2017), arXiv:1612.00410 .
- Y. Wang, J. M. L. Ribeiro, and P. Tiwary, “Past–future information bottleneck for sampling molecular reaction coordinate simultaneously with thermodynamics and kinetics,” Nat. Commun. 10, 3573 (2019).
- J. M. Tomczak and M. Welling, “VAE with a vampprior,” International Conference on Artificial Intelligence and Statistics, AISTATS 2018 , 1214–1223 (2018), arXiv:1705.07120 .
- D. Wang, R. Zhao, J. D. Weeks, and P. Tiwary, “Influence of long-range forces on the transition states and dynamics of nacl ion-pair dissociation in water,” The Journal of Physical Chemistry B 126, 545–551 (2022).
- D. Nagel, S. Sartore, and G. Stock, “Toward a benchmark for markov state models: The folding of hp35,” The Journal of Physical Chemistry Letters 14, 6956–6967 (2023b), pMID: 37504674, https://doi.org/10.1021/acs.jpclett.3c01561 .
- B. E. Husic, R. T. McGibbon, M. M. Sultan, and V. S. Pande, “Optimized parameter selection reveals trends in markov state models for protein folding,” The Journal of Chemical Physics 145, 194103 (2016), https://doi.org/10.1063/1.4967809 .
- R. T. McGibbon and V. S. Pande, “Variational cross-validation of slow dynamical modes in molecular kinetics,” The Journal of chemical physics 142 (2015).
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research 12, 2825–2830 (2011).
- H. Sidky, W. Chen, and A. L. Ferguson, “High-resolution markov state models for the dynamics of trp-cage miniprotein constructed over slow folding modes identified by state-free reversible vampnets,” The Journal of Physical Chemistry B 123, 7999–8009 (2019), pMID: 31453697, https://doi.org/10.1021/acs.jpcb.9b05578 .
- F. Noé, C. Schütte, E. Vanden-Eijnden, L. Reich, and T. R. Weikl, “Constructing the equilibrium ensemble of folding pathways from short off-equilibrium simulations,” Proceedings of the National Academy of Sciences 106, 19011–19016 (2009).
- T. R. Weikl, “Transition states in protein folding kinetics: Modeling ϕitalic-ϕ\phiitalic_ϕ-values of small β𝛽\betaitalic_β-sheet proteins,” Biophysical journal 94, 929–937 (2008).
- S. a Beccara, T. Škrbić, R. Covino, and P. Faccioli, “Dominant folding pathways of a ww domain,” Proceedings of the National Academy of Sciences 109, 2330–2335 (2012).
- T. J. Lane, G. R. Bowman, K. Beauchamp, V. A. Voelz, and V. S. Pande, “Markov state model reveals folding and functional dynamics in ultra-long md trajectories,” Journal of the American Chemical Society 133, 18413–18419 (2011).
- X. Huang, Y. Yao, G. R. Bowman, J. Sun, L. J. Guibas, G. Carlsson, and V. S. Pande, “Constructing multi-resolution markov state models (msms) to elucidate rna hairpin folding mechanisms,” in Biocomputing 2010 (World Scientific, 2010) pp. 228–239.
- Y. Yao, R. Z. Cui, G. R. Bowman, D.-A. Silva, J. Sun, and X. Huang, “Hierarchical nyström methods for constructing markov state models for conformational dynamics,” The Journal of chemical physics 138 (2013).
- D. Nagel, “Moldyn freiburg – clustering,” https://github.com/moldyn/Clustering (2021), [Online; accessed 27-November-2023].