Fusing Neural and Physical: Augment Protein Conformation Sampling with Tractable Simulations (2402.10433v2)
Abstract: The protein dynamics are common and important for their biological functions and properties, the study of which usually involves time-consuming molecular dynamics (MD) simulations in silico. Recently, generative models has been leveraged as a surrogate sampler to obtain conformation ensembles with orders of magnitude faster and without requiring any simulation data (a "zero-shot" inference). However, being agnostic of the underlying energy landscape, the accuracy of such generative model may still be limited. In this work, we explore the few-shot setting of such pre-trained generative sampler which incorporates MD simulations in a tractable manner. Specifically, given a target protein of interest, we first acquire some seeding conformations from the pre-trained sampler followed by a number of physical simulations in parallel starting from these seeding samples. Then we fine-tuned the generative model using the simulation trajectories above to become a target-specific sampler. Experimental results demonstrated the superior performance of such few-shot conformation sampler at a tractable computational cost.
- Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. arXiv preprint arXiv:2205.15019, 2022.
- Brian DO Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326, 1982.
- Two for one: Diffusion models and force fields for coarse-grained molecular dynamics. arXiv preprint arXiv:2302.00600, 2023.
- Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557):871–876, 2021.
- Se (3)-stochastic flow matching for protein backbone generation. arXiv preprint arXiv:2310.02391, 2023.
- Particle mesh ewald: An n log (n) method for ewald sums in large systems. The Journal of chemical physics, 98(12):10089–10092, 1993.
- Robust deep learning–based protein sequence design using proteinmpnn. Science, 378(6615):49–56, 2022.
- Openmm 8: Molecular dynamics simulation with machine learning potentials. The Journal of Physical Chemistry B, 2023.
- Conformational transition of sars-cov-2 spike glycoprotein between its closed and open states. The Journal of chemical physics, 153(7), 2020.
- Illuminating protein space with a programmable generative model. Nature, pp. 1–9, 2023.
- Direct generation of protein conformational ensembles via machine learning. Nature Communications, 14(1):774, 2023.
- Eigenfold: Generative protein structure prediction with diffusion models. arXiv preprint arXiv:2304.02198, 2023.
- Comparison of simple potential functions for simulating liquid water. The Journal of chemical physics, 79(2):926–935, 1983.
- Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637):1123–1130, 2023.
- How fast-folding proteins fold. Science, 334(6055):517–520, 2011.
- On the limited memory bfgs method for large scale optimization. Mathematical programming, 45(1-3):503–528, 1989.
- Str2str: A score-based framework for zero-shot protein conformation sampling. In The Twelfth International Conference on Learning Representations, 2024.
- ff14sb: improving the accuracy of protein side chain and backbone parameters from ff99sb. Journal of chemical theory and computation, 11(8):3696–3713, 2015.
- Slow dynamics in protein fluctuations revealed by time-structure based independent component analysis: the case of domain motions. The Journal of chemical physics, 134(6), 2011.
- Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning. Science, 365(6457):eaaw1147, 2019.
- Identification of slow molecular order parameters for markov model construction. The Journal of chemical physics, 139(1), 2013.
- Protein sequence and structure co-design with equivariant translation. arXiv preprint arXiv:2210.08761, 2022.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem. arXiv preprint arXiv:2206.04119, 2022.
- Alphafold2-rave: From sequence to boltzmann ranking. Journal of Chemical Theory and Computation, 2023.
- Loup Verlet. Computer” experiments” on classical fluids. i. thermodynamical properties of lennard-jones molecules. Physical review, 159(1):98, 1967.
- De novo design of protein structure and function with rfdiffusion. Nature, 620(7976):1089–1100, 2023.
- Protein structure generation via folding diffusion. arXiv preprint arXiv:2209.15611, 2022.
- Se (3) diffusion model with application to protein backbone generation. arXiv preprint arXiv:2302.02277, 2023.
- Unified efficient thermostat scheme for the canonical ensemble with holonomic or isokinetic constraints via molecular dynamics. The Journal of Physical Chemistry A, 123(28):6056–6079, 2019.
- Towards predicting equilibrium distributions for molecular systems with deep learning. arXiv preprint arXiv:2306.05445, 2023.