Protein Conformation Generation via Force-Guided SE(3) Diffusion Models (2403.14088v2)
Abstract: The conformational landscape of proteins is crucial to understanding their functionality in complex biological processes. Traditional physics-based computational methods, such as molecular dynamics (MD) simulations, suffer from rare event sampling and long equilibration time problems, hindering their applications in general protein systems. Recently, deep generative modeling techniques, especially diffusion models, have been employed to generate novel protein conformations. However, existing score-based diffusion methods cannot properly incorporate important physical prior knowledge to guide the generation process, causing large deviations in the sampled protein conformations from the equilibrium distribution. In this paper, to overcome these limitations, we propose a force-guided SE(3) diffusion model, ConfDiff, for protein conformation generation. By incorporating a force-guided network with a mixture of data-based score models, ConfDiff can generate protein conformations with rich diversity while preserving high fidelity. Experiments on a variety of protein conformation prediction tasks, including 12 fast-folding proteins and the Bovine Pancreatic Trypsin Inhibitor (BPTI), demonstrate that our method surpasses the state-of-the-art method.
- Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. arXiv preprint arXiv:2205.15019, 2022.
- Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557):871–876, 2021.
- The protein data bank. Nucleic acids research, 28(1):235–242, 2000.
- Se (3)-stochastic flow matching for protein backbone generation. arXiv preprint arXiv:2310.02391, 2023.
- Riemannian score-based generative modelling. Advances in Neural Information Processing Systems, 35:2406–2422, 2022.
- Sampling alternative conformational states of transporters and receptors with alphafold2. Elife, 11:e75751, 2022.
- Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
- Openmm 7: Rapid development of high performance algorithms for molecular dynamics. PLoS computational biology, 13(7):e1005659, 2017.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
- Deeptime: a python library for machine learning dynamical models from time series data. Machine Learning: Science and Technology, 2021.
- FASPR: an open-source tool for fast and accurate protein side-chain packing. Bioinformatics, 36(12):3758–3765, 04 2020. doi: 10.1093/bioinformatics/btaa234. URL https://doi.org/10.1093/bioinformatics/btaa234.
- Illuminating protein space with a programmable generative model. Nature, pp. 1–9, 2023.
- Direct generation of protein conformational ensembles via machine learning. Nature Communications, 14(1):774, 2023.
- Eigenfold: Generative protein structure prediction with diffusion models. arXiv preprint arXiv:2304.02198, 2023.
- Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
- Denoising diffusion probabilistic models on so (3) for rotational alignment. In ICLR 2022 Workshop on Geometrical and Topological Representation Learning, 2022.
- Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv, 2022:500902, 2022.
- How fast-folding proteins fold. 334(6055):517–520. doi: 10.1126/science.1208351. URL https://www.science.org/doi/10.1126/science.1208351.
- Contrastive energy prediction for exact energy-guided diffusion sampling in offline reinforcement learning. arXiv preprint arXiv:2304.12824, 2023.
- Str2str: A score-based framework for zero-shot protein conformation sampling. In The Twelfth International Conference on Learning Representations, 2024.
- Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures. Advances in Neural Information Processing Systems, 35:9754–9767, 2022.
- ff14sb: Improving the accuracy of protein side chain and backbone parameters from ff99sb. Journal of Chemical Theory and Computation, 11(8):3696–3713, 2015. doi: 10.1021/acs.jctc.5b00255. PMID: 26574453.
- Protein ensemble generation through variational autoencoder latent space sampling. bioRxiv, pp. 2023–08, 2023.
- Abdiffuser: Full-atom generation of in-vitro functioning antibodies. arXiv preprint arXiv:2308.05027, 2023.
- Learning interpolations between boltzmann densities. Transactions on Machine Learning Research, 2023.
- Slow dynamics of a protein backbone in molecular dynamics simulation revealed by time-structure based independent component analysis. The Journal of Chemical Physics, 139(21):215102, 12 2013. ISSN 0021-9606. doi: 10.1063/1.4834695. URL https://doi.org/10.1063/1.4834695.
- Improved generalized born solvent model parameters for protein simulations. Journal of Chemical Theory and Computation, 9(4):2020–2034, 2013. doi: 10.1021/ct3010485. PMID: 25788871.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
- Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning. Science, 365(6457):eaaw1147, 2019.
- Identification of slow molecular order parameters for markov model construction. The Journal of chemical physics, 139(1), 2013.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
- Impact of protein conformational diversity on alphafold predictions. Bioinformatics, 38(10):2742–2748, 2022.
- PyEMMA 2: A Software Package for Estimation, Validation, and Analysis of Markov Models. Journal of Chemical Theory and Computation, 11:5525–5542, October 2015. ISSN 1549-9618. doi: 10.1021/acs.jctc.5b00743. URL http://dx.doi.org/10.1021/acs.jctc.5b00743.
- Atomic-level characterization of the structural dynamics of proteins. Science, 330(6002):341–346, 2010. doi: 10.1126/science.1187409. URL https://www.science.org/doi/abs/10.1126/science.1187409.
- Anton 3: twenty microseconds of molecular dynamics simulation before lunch. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11, 2021.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- Speach_af: Sampling protein ensembles and conformational heterogeneity with alphafold2. PLOS Computational Biology, 18(8):e1010483, 2022.
- Efficient estimation of rare-event kinetics. Physical Review X, 6(1):011009, 2016.
- Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem. arXiv preprint arXiv:2206.04119, 2022.
- From sequence to boltzmann weighted ensemble of structures with alphafold2-rave. bioRxiv, pp. 2022–05, 2022.
- Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. Journal of the American Chemical Society, 137(7):2695–2703, 2015.
- Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models. BioRxiv, pp. 2022–12, 2022.
- Predicting multiple conformations via sequence clustering and alphafold2. Nature, pp. 1–3, 2023.
- Protein structure generation via folding diffusion. arXiv preprint arXiv:2209.15611, 2022a.
- High-resolution de novo structure prediction from primary sequence. BioRxiv, pp. 2022–07, 2022b.
- Fast protein backbone generation with se (3) flow matching. arXiv preprint arXiv:2310.05297, 2023a.
- Se (3) diffusion model with application to protein backbone generation. arXiv preprint arXiv:2302.02277, 2023b.
- Scoring function for automated assessment of protein structure template quality. Proteins, 57(4):702—710, December 2004. ISSN 0887-3585. doi: 10.1002/prot.20264.
- Towards predicting equilibrium distributions for molecular systems with deep learning. arXiv preprint arXiv:2306.05445, 2023.