Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Protein Conformation Generation via Force-Guided SE(3) Diffusion Models (2403.14088v2)

Published 21 Mar 2024 in q-bio.BM and cs.LG

Abstract: The conformational landscape of proteins is crucial to understanding their functionality in complex biological processes. Traditional physics-based computational methods, such as molecular dynamics (MD) simulations, suffer from rare event sampling and long equilibration time problems, hindering their applications in general protein systems. Recently, deep generative modeling techniques, especially diffusion models, have been employed to generate novel protein conformations. However, existing score-based diffusion methods cannot properly incorporate important physical prior knowledge to guide the generation process, causing large deviations in the sampled protein conformations from the equilibrium distribution. In this paper, to overcome these limitations, we propose a force-guided SE(3) diffusion model, ConfDiff, for protein conformation generation. By incorporating a force-guided network with a mixture of data-based score models, ConfDiff can generate protein conformations with rich diversity while preserving high fidelity. Experiments on a variety of protein conformation prediction tasks, including 12 fast-folding proteins and the Bovine Pancreatic Trypsin Inhibitor (BPTI), demonstrate that our method surpasses the state-of-the-art method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. arXiv preprint arXiv:2205.15019, 2022.
  2. Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557):871–876, 2021.
  3. The protein data bank. Nucleic acids research, 28(1):235–242, 2000.
  4. Se (3)-stochastic flow matching for protein backbone generation. arXiv preprint arXiv:2310.02391, 2023.
  5. Riemannian score-based generative modelling. Advances in Neural Information Processing Systems, 35:2406–2422, 2022.
  6. Sampling alternative conformational states of transporters and receptors with alphafold2. Elife, 11:e75751, 2022.
  7. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  8. Openmm 7: Rapid development of high performance algorithms for molecular dynamics. PLoS computational biology, 13(7):e1005659, 2017.
  9. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  10. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  11. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
  12. Deeptime: a python library for machine learning dynamical models from time series data. Machine Learning: Science and Technology, 2021.
  13. FASPR: an open-source tool for fast and accurate protein side-chain packing. Bioinformatics, 36(12):3758–3765, 04 2020. doi: 10.1093/bioinformatics/btaa234. URL https://doi.org/10.1093/bioinformatics/btaa234.
  14. Illuminating protein space with a programmable generative model. Nature, pp.  1–9, 2023.
  15. Direct generation of protein conformational ensembles via machine learning. Nature Communications, 14(1):774, 2023.
  16. Eigenfold: Generative protein structure prediction with diffusion models. arXiv preprint arXiv:2304.02198, 2023.
  17. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
  18. Denoising diffusion probabilistic models on so (3) for rotational alignment. In ICLR 2022 Workshop on Geometrical and Topological Representation Learning, 2022.
  19. Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv, 2022:500902, 2022.
  20. How fast-folding proteins fold. 334(6055):517–520. doi: 10.1126/science.1208351. URL https://www.science.org/doi/10.1126/science.1208351.
  21. Contrastive energy prediction for exact energy-guided diffusion sampling in offline reinforcement learning. arXiv preprint arXiv:2304.12824, 2023.
  22. Str2str: A score-based framework for zero-shot protein conformation sampling. In The Twelfth International Conference on Learning Representations, 2024.
  23. Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures. Advances in Neural Information Processing Systems, 35:9754–9767, 2022.
  24. ff14sb: Improving the accuracy of protein side chain and backbone parameters from ff99sb. Journal of Chemical Theory and Computation, 11(8):3696–3713, 2015. doi: 10.1021/acs.jctc.5b00255. PMID: 26574453.
  25. Protein ensemble generation through variational autoencoder latent space sampling. bioRxiv, pp.  2023–08, 2023.
  26. Abdiffuser: Full-atom generation of in-vitro functioning antibodies. arXiv preprint arXiv:2308.05027, 2023.
  27. Learning interpolations between boltzmann densities. Transactions on Machine Learning Research, 2023.
  28. Slow dynamics of a protein backbone in molecular dynamics simulation revealed by time-structure based independent component analysis. The Journal of Chemical Physics, 139(21):215102, 12 2013. ISSN 0021-9606. doi: 10.1063/1.4834695. URL https://doi.org/10.1063/1.4834695.
  29. Improved generalized born solvent model parameters for protein simulations. Journal of Chemical Theory and Computation, 9(4):2020–2034, 2013. doi: 10.1021/ct3010485. PMID: 25788871.
  30. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
  31. Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning. Science, 365(6457):eaaw1147, 2019.
  32. Identification of slow molecular order parameters for markov model construction. The Journal of chemical physics, 139(1), 2013.
  33. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  34. Impact of protein conformational diversity on alphafold predictions. Bioinformatics, 38(10):2742–2748, 2022.
  35. PyEMMA 2: A Software Package for Estimation, Validation, and Analysis of Markov Models. Journal of Chemical Theory and Computation, 11:5525–5542, October 2015. ISSN 1549-9618. doi: 10.1021/acs.jctc.5b00743. URL http://dx.doi.org/10.1021/acs.jctc.5b00743.
  36. Atomic-level characterization of the structural dynamics of proteins. Science, 330(6002):341–346, 2010. doi: 10.1126/science.1187409. URL https://www.science.org/doi/abs/10.1126/science.1187409.
  37. Anton 3: twenty microseconds of molecular dynamics simulation before lunch. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp.  1–11, 2021.
  38. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
  39. Speach_af: Sampling protein ensembles and conformational heterogeneity with alphafold2. PLOS Computational Biology, 18(8):e1010483, 2022.
  40. Efficient estimation of rare-event kinetics. Physical Review X, 6(1):011009, 2016.
  41. Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem. arXiv preprint arXiv:2206.04119, 2022.
  42. From sequence to boltzmann weighted ensemble of structures with alphafold2-rave. bioRxiv, pp.  2022–05, 2022.
  43. Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. Journal of the American Chemical Society, 137(7):2695–2703, 2015.
  44. Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models. BioRxiv, pp.  2022–12, 2022.
  45. Predicting multiple conformations via sequence clustering and alphafold2. Nature, pp.  1–3, 2023.
  46. Protein structure generation via folding diffusion. arXiv preprint arXiv:2209.15611, 2022a.
  47. High-resolution de novo structure prediction from primary sequence. BioRxiv, pp.  2022–07, 2022b.
  48. Fast protein backbone generation with se (3) flow matching. arXiv preprint arXiv:2310.05297, 2023a.
  49. Se (3) diffusion model with application to protein backbone generation. arXiv preprint arXiv:2302.02277, 2023b.
  50. Scoring function for automated assessment of protein structure template quality. Proteins, 57(4):702—710, December 2004. ISSN 0887-3585. doi: 10.1002/prot.20264.
  51. Towards predicting equilibrium distributions for molecular systems with deep learning. arXiv preprint arXiv:2306.05445, 2023.
Citations (12)

Summary

  • The paper introduces ConfDiff, a force-guided SE(3) diffusion model that generates realistic protein conformations without relying on MD training data.
  • It employs an intermediate force-guidance strategy, integrating MD force fields to favor low-energy, physically plausible structures.
  • Experimental results show ConfDiff achieves higher conformation diversity and quality compared to state-of-the-art methods.

Enhanced Protein Conformation Generation with Force-Guided SE(3) Diffusion Models

Introduction

Protein dynamics play a crucial role in most biological processes, with protein conformational changes being a pivotal aspect. Traditional methods for protein conformation sampling, such as Molecular Dynamics (MD) simulations, despite being detailed, face limitations in sampling efficiency and capturing rare events. Emerging deep generative models, particularly diffusion models, present a promising alternative for generating novel protein conformations. These models, however, often miss incorporating crucial physical priors, which results in deviations from realistic protein dynamics. Addressing this, we propose a force-guided SE(3) diffusion model, termed ConfDiff, aimed at generating protein conformations with high fidelity and diversity, aligned with the equilibrium Boltzmann distribution.

Methodology

Baseline Model Construction

We establish a baseline diffusion model combining a sequence-conditional model with an unconditional model using classifier-free guidance on SE(3). This strategy is devised to balance the conformation quality with diversity. Unlike existing models that rely heavily on MD data for training, ConfDiff does not necessitate such data, broadening its applicability.

Incorporation of Force-Guided Sampling

A novel addition to our method is the employment of a force-guided approach during the diffusion sampling phase. This is achieved through the construction of a force-guided network alongside a mixture of score models. By applying MD force fields as a physics-based preference function, we emphasize generating conformations with lower potential energy. This preference significantly boosts the chances of sampling more accurate protein conformations that resonate with physical realities. Notably, ConfDiff introduces an intermediate force guidance strategy into the reverse-time diffusion process, making it the inaugural force-guided network catering to protein conformation generation.

SE(3) Diffusion Process

The SE(3) diffusion process, designed for protein backbone generation, treats translations and rotations independently, promoting a more nuanced sampling process. It adapts contrasting noise schedules for translation and rotation, accommodating the distinctiveness of protein conformations.

Experimental Insights

The efficacy of ConfDiff is evaluated across various benchmarks, where it exhibits consistent superiority over contemporary state-of-the-art models. Specifically, our method demonstrates the ability to generate more diverse sample sets without compromising their quality, as indicated by improved scores across standard evaluation metrics. This success underscores the advantage of integrating physical priors via force-guided diffusion processes in enhancing the generation of biologically plausible protein conformations.

Theoretical Underpinning

Critical to our approach is the theoretical grounding provided by adapting a contrastive energy prediction (CEP) framework, which allows the integration of physical priors seamlessly. Our leverage of the MD energy function to inform the diffusion process exemplifies the practical application of this theory, affording our model an edge in generating energetically favorable protein conformations.

Future Directions

While ConfDiff lays a promising foundation for protein conformation generation through diffusion models, future research could explore enhancing its efficiency, especially concerning the computational demands of full-atom energy evaluations. Furthermore, refining the force-guided diffusion process to facilitate even more accurate sampling of conformational states remains an enticing prospect.

Conclusion

Conclusively, ConfDiff represents a significant stride forward in the generation of protein conformations employing diffusion models. By melding sequence-conditional modeling with force-guided diffusion, informed by physical priors, this method opens new corridors in accurately predicting protein dynamics, potentially benefiting various biological and pharmaceutical research endeavors.