Internal-Coordinate Density Modelling of Protein Structure: Covariance Matters (2302.13711v3)
Abstract: After the recent ground-breaking advances in protein structure prediction, one of the remaining challenges in protein machine learning is to reliably predict distributions of structural states. Parametric models of fluctuations are difficult to fit due to complex covariance structures between degrees of freedom in the protein chain, often causing models to either violate local or global structural constraints. In this paper, we present a new strategy for modelling protein densities in internal coordinates, which uses constraints in 3D space to induce covariance structure between the internal degrees of freedom. We illustrate the potential of the procedure by constructing a variational autoencoder with full covariance output induced by the constraints implied by the conditional mean in 3D, and demonstrate that our approach makes it possible to scale density models of internal coordinates to full protein backbones in two settings: 1) a unimodal setting for proteins exhibiting small fluctuations and limited amounts of available data, and 2) a multimodal setting for larger conformational changes in a high data regime.
- Mohammed AlQuraishi. pnerf: Parallelized conversion from internal to cartesian coordinates. bioRxiv, pp. 385450, 2018.
- Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. arXiv preprint arXiv:2205.15019, 2022.
- Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557):871–876, 2021.
- Assessing a novel approach for predicting local 3D protein structures from sequence. Proteins, 62:865–880, 2006.
- A generative, probabilistic model of local protein structure. Proc Natl Acad Sci USA, 105(26):8932–8937, 2008.
- Phaistos: A framework for markov chain monte carlo simulation and inference of protein structure. Journal of computational chemistry, 34(19):1697–1705, 2013.
- Equilibrium simulations of proteins using molecular fragment replacement and nmr chemical shifts. Proceedings of the National Academy of Sciences, 111(38):13852–13857, 2014.
- Subtle monte carlo updates in dense molecular systems. Journal of Chemical Theory and Computation, 8(2):695–702, 2012.
- HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. J Mol Biol, 301(1):173–190, 2000.
- Hidden Markov model approach for identifying the modular framework of the protein backbone. Protein Eng Des Sel, 12(12):1063–1073, 1999.
- A hidden Markov model derived structural alphabet for proteins. J Mol Biol, 339(3):591–605, 2004.
- Shrinkage algorithms for mmse covariance estimation. IEEE transactions on signal processing, 58(10):5016–5029, 2010.
- Markov state models of biomolecular conformational dynamics. Current opinion in structural biology, 25:135–144, 2014.
- Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks. Proteins, 41(3):271–287, 2000.
- Openmm 7: Rapid development of high performance algorithms for molecular dynamics. PLoS computational biology, 13(7):e1005659, 2017.
- An MML classification of protein structure that knows about angles and sequence. Pac Symp Biocomput, 3:585–596, 1998.
- Monte Carlo update for chain molecules: Biased Gaussian steps in torsional space. J. Chem. Phys., 114:8154–8158, 2001.
- Sampling realistic protein conformations using local structural bias. PLoS Comput Biol, 2(9):e131, 2006.
- Variational encoding of complex dynamics. Physical Review E, 97(6):062412, 2018.
- Deeptime: a python library for machine learning dynamical models from time series data. Machine Learning: Science and Technology, 2021.
- Illuminating protein space with a programmable generative model. bioRxiv, pp. 2022–12, 2022.
- A. Irbäck and S. Mohanty. Profasi: a monte carlo simulation package for protein folding and aggregation. J. Comput. Chem., 27(13):1548–1555, 2006.
- Edwin T Jaynes. Information theory and statistical mechanics. Physical review, 106(4):620, 1957.
- Torsional diffusion for molecular conformer generation. arXiv preprint arXiv:2206.01729, 2022.
- Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
- The generalized maximum entropy principle. IEEE Transactions on systems, Man, and Cybernetics, 19(5):1042–1052, 1989.
- Flow-matching: Efficient coarse-graining of molecular dynamics without forces. Journal of Chemical Theory and Computation, 19(3):942–952, 2023.
- How fast-folding proteins fold. Science, 334(6055):517–520, 2011.
- ff14sb: improving the accuracy of protein side chain and backbone parameters from ff99sb. Journal of chemical theory and computation, 11(8):3696–3713, 2015.
- Vampnets for deep learning of molecular kinetics. Nature communications, 9(1):1–11, 2018.
- Deep learning markov and koopman models with physical constraints. In Mathematical and Scientific Machine Learning, pp. 451–475. PMLR, 2020.
- Separation of a mixture of independent signals using time delayed correlations. Physical review letters, 72(23):3634, 1994.
- Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning. Science, 365(6457):eaaw1147, 2019.
- The matrix cookbook. Technical University of Denmark, 7(15):510, 2008.
- Markov models of molecular kinetics: Generation and validation. The Journal of chemical physics, 134(17):174105, 2011.
- 3d flexible refinement: structure and motion of flexible proteins from cryo-em. BioRxiv, 2021.
- U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Springer, 2015.
- Inferring a continuous distribution of atom coordinates from cryo-em images using vaes. arXiv preprint arXiv:2106.14108, 2021.
- Markov state models for rare events in molecular dynamics. Entropy, 16(1):258–286, 2013.
- Schrödinger. The PyMOL Molecular Graphics System. version 2.5.2.
- A direct approach to conformational dynamics based on hybrid monte carlo. Journal of Computational Physics, 151(1):146–168, 1999.
- Improved protein structure prediction using potentials from deep learning. Nature, 577(7792):706–710, 2020.
- Error analysis and efficient sampling in markovian state models for molecular dynamics. The Journal of chemical physics, 123(20):204909, 2005.
- Transferable neural networks for enhanced sampling of protein dynamics. Journal of chemical theory and computation, 14(4):1887–1894, 2018.
- Efficient generative modelling of protein structure fragments using a deep markov model. In International Conference on Machine Learning, pp. 10258–10267. PMLR, 2021.
- Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models. bioRxiv, pp. 2022–12, 2022.
- Protein structure generation via folding diffusion. arXiv preprint arXiv:2209.15611, 2022a.
- High-resolution de novo structure prediction from primary sequence. BioRxiv, pp. 2022–07, 2022b.
- Graph dynamical networks for unsupervised learning of atomic scale dynamics in materials. Nature communications, 10(1):1–9, 2019.
- Cryodrgn: reconstruction of heterogeneous cryo-em structures using neural networks. Nature methods, 18(2):176–185, 2021.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.