Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DiffPack: A Torsional Diffusion Model for Autoregressive Protein Side-Chain Packing (2306.01794v2)

Published 1 Jun 2023 in q-bio.QM and cs.LG

Abstract: Proteins play a critical role in carrying out biological functions, and their 3D structures are essential in determining their functions. Accurately predicting the conformation of protein side-chains given their backbones is important for applications in protein structure prediction, design and protein-protein interactions. Traditional methods are computationally intensive and have limited accuracy, while existing machine learning methods treat the problem as a regression task and overlook the restrictions imposed by the constant covalent bond lengths and angles. In this work, we present DiffPack, a torsional diffusion model that learns the joint distribution of side-chain torsional angles, the only degrees of freedom in side-chain packing, by diffusing and denoising on the torsional space. To avoid issues arising from simultaneous perturbation of all four torsional angles, we propose autoregressively generating the four torsional angles from $\chi_1$ to $\chi_4$ and training diffusion models for each torsional angle. We evaluate the method on several benchmarks for protein side-chain packing and show that our method achieves improvements of $11.9\%$ and $13.5\%$ in angle accuracy on CASP13 and CASP14, respectively, with a significantly smaller model size ($60\times$ fewer parameters). Additionally, we show the effectiveness of our method in enhancing side-chain predictions in the AlphaFold2 model. Code is available at https://github.com/DeepGraphLearning/DiffPack.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. The rosetta all-atom energy function for macromolecular modeling and design. bioRxiv, 2017.
  2. Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. arXiv preprint arXiv:2205.15019, 2022.
  3. Computational reconstruction of atomistic protein structures from coarse-grained models. Computational and Structural Biotechnology Journal, 18:162 – 176, 2019.
  4. Accurate prediction of protein structures and interactions using a 3-track neural network. Science (New York, N.Y.), 373:871 – 876, 2021.
  5. Graphqa: protein model quality assessment using graph convolutional networks. Bioinformatics, 37:360 – 366, 2020.
  6. A protein-dependent side-chain rotamer library. In BMC bioinformatics, volume 12, pages 1–12. Springer, 2011.
  7. Improved side-chain modeling by coupling clash-detection guided iterative search with rotamer relaxation. Bioinformatics, 27 6:785–90, 2011.
  8. Pyrosetta: a script-based interface for implementing molecular modeling algorithms using rosetta. Bioinformatics, 26(5):689–691, 2010.
  9. Graph networks as a universal machine learning framework for molecules and crystals. Chemistry of Materials, 31(9):3564–3572, 2019.
  10. Wavegrad: Estimating gradients for waveform generation. arXiv preprint arXiv:2009.00713, 2020.
  11. The use of position-specific rotamers in model building by homology. Proteins: Structure, 23, 1995.
  12. Diffdock: Diffusion steps, twists, and turns for molecular docking. International Conference on Learning Representations (ICLR), 2023.
  13. Protein interaction interface region prediction by geometric deep learning. Bioinformatics, 2021.
  14. Side-chain and backbone flexibility in protein core design. Journal of molecular biology, 290 1:305–18, 1999.
  15. 3.13 computational methods related to molecular structure and reaction chemistry of biomaterials. 2017.
  16. Protein contacts, inter-residue interactions and side-chain modelling. Biochimie, 90 4:626–39, 2008.
  17. Se(3)-transformers: 3d roto-translation equivariant attention networks. ArXiv, abs/2006.10503, 2020.
  18. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nature Methods, 17(2):184–192, 2020.
  19. Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263–1272. PMLR, 2017.
  20. Structure-based protein function prediction using graph convolutional networks. Nature Communications, 12, 2021.
  21. Contrastive representation learning for 3d protein structures. ArXiv, abs/2205.15675, 2022.
  22. Intrinsic-extrinsic convolution and pooling for learning on 3d protein structures. International Conference on Learning Representations, 2021.
  23. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  24. Propose: Direct exhaustive protein-protein docking with side chain flexibility. Journal of chemical theory and computation, 14 9:4938–4947, 2018.
  25. Autoregressive diffusion models. In International Conference on Learning Representations, 2022a. URL https://openreview.net/forum?id=Lm8T39vLDTE.
  26. Equivariant diffusion for molecule generation in 3d. In International Conference on Machine Learning, pages 8867–8887, 2022b.
  27. Faspr: an open-source tool for fast and accurate protein side-chain packing. Bioinformatics, 36(12):3758–3765, 2020.
  28. Illuminating protein space with a programmable generative model. bioRxiv, pages 2022–12, 2022.
  29. Learning from protein structure with geometric vector perceptrons. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=1YLJDvSx6J4.
  30. Torsional diffusion for molecular conformer generation. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=w6fj2r62r_H.
  31. Neural message passing with edge updates for predicting properties of molecules and materials. arXiv preprint arXiv:1806.03146, 2018.
  32. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
  33. Directional message passing for molecular graphs. In International Conference on Learning Representations (ICLR), 2020.
  34. Gemnet: Universal directional graph neural networks for molecules. arXiv preprint arXiv:2106.08903, 2021.
  35. Improved prediction of protein side-chain conformations with scwrl4. Proteins: Structure, Function, and Bioinformatics, 77(4):778–795, 2009.
  36. A set of van der waals and coulombic radii of protein atoms for molecular and solvent-accessible surface calculation, packing evaluation, and docking. Proteins: Structure, Function, and Bioinformatics, 32(1):111–127, 1998.
  37. Fast and accurate prediction of protein side-chain conformations. Bioinformatics, 27:2913 – 2914, 2011.
  38. Generating novel, designable, and diverse protein structures by equivariantly diffusing oriented residue clouds. ArXiv, abs/2301.12485, 2023.
  39. Prediction of amino acid side chain conformation using a deep neural network. ArXiv, abs/1707.08381, 2017.
  40. Molecular geometry pretraining with SE(3)-invariant denoising distance matching. In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=CjTHVo1dvR.
  41. Spherical message passing for 3d graph networks. arXiv preprint arXiv:2102.05013, 2021.
  42. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022a.
  43. Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2022b.
  44. Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures. 2022. URL https://openreview.net/forum?id=jSorGn2Tjg.
  45. Attnpacker: An end-to-end deep learning method for rotamer-free protein side-chain packing. bioRxiv, pages 2022–03, 2022.
  46. Dlpacker: deep learning for prediction of amino acid side chain conformations in proteins. Proteins: Structure, Function, and Bioinformatics, 90(6):1278–1290, 2022.
  47. Sidepro: A novel machine learning approach for the fast and accurate prediction of side-chain conformations. Proteins: Structure, 80, 2012.
  48. Coupling protein side-chain and backbone flexibility improves the re-design of protein-ligand specificity. PLoS Computational Biology, 11, 2015.
  49. E(n) equivariant graph neural networks. In International Conference on Machine Learning, 2021.
  50. Modeling relational data with graph convolutional networks. In The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15, pages 593–607. Springer, 2018.
  51. Quantum-chemical insights from deep tensor neural networks. Nature communications, 8(1):1–8, 2017a.
  52. Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. arXiv preprint arXiv:1706.08566, 2017b.
  53. A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure, 19(6):844–858, 2011.
  54. A structural homology approach for computational protein design with flexible backbone. Bioinformatics, 2018.
  55. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265, 2015.
  56. Multi-scale representation learning on proteins. In Neural Information Processing Systems, 2022.
  57. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=PxTIG12RRHS.
  58. Fast end-to-end learning on protein surfaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15272–15281, 2021.
  59. Tensor field networks: Rotation- and translation-equivariant neural networks for 3d point clouds. ArXiv, abs/1802.08219, 2018.
  60. Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=6TxBxqNME1Y.
  61. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  62. Deep graph library: A graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315, 2019.
  63. Protein secondary structure prediction using deep convolutional neural fields. Scientific reports, 6(1):1–11, 2016.
  64. Side-chain conformational preferences govern protein-protein interactions. Journal of the American Chemical Society, 138 33:10386–9, 2016a.
  65. Rotamer libraries for the high-resolution design of β𝛽\betaitalic_β-amino acid foldamers. bioRxiv, 2016b.
  66. Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models. bioRxiv, 2022.
  67. Protein structure generation via folding diffusion. arXiv preprint arXiv:2209.15611, 2022a.
  68. Diffusion-based molecule generation with informative prior bridges. arXiv preprint arXiv:2209.00865, 2022b.
  69. Opus-rota3: Improving protein side-chain modeling by deep neural networks and ensemble methods. Journal of chemical information and modeling, 2020.
  70. Opus-rota4: a gradient-based protein side-chain modeling framework assisted by deep learning-based predictors. Briefings in Bioinformatics, 23, 2021.
  71. Fast and accurate algorithms for protein side-chain packing. J. ACM, 53:533–557, 2006.
  72. Geodiff: A geometric diffusion model for molecular conformation generation. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=PzcvxEMzvQC.
  73. Minimizing and learning energy functions for side-chain prediction. Journal of computational biology : a journal of computational molecular cell biology, 15 7:899–911, 2007.
  74. Se(3) diffusion model with application to protein backbone generation. ArXiv, abs/2302.02277, 2023.
  75. Protein representation learning by geometric structure pretraining. In International Conference on Learning Representations, 2023a.
  76. Physics-inspired protein encoder pre-training via siamese sequence-structure diffusion trajectory prediction. ArXiv, abs/2301.12068, 2023b.
Citations (23)

Summary

  • The paper introduces DiffPack, which employs an autoregressive torsional diffusion approach to predict protein side-chain conformations with up to 13.5% improved accuracy.
  • It leverages separate SE(3)-invariant score networks for each torsion angle, reducing overparameterization by focusing on minimal torsional degrees of freedom.
  • Empirical results on CASP benchmarks demonstrate DiffPack's superior efficiency and robustness, using 60x fewer parameters and enhancing AlphaFold2 side-chain predictions.

DiffPack: A Torsional Diffusion Model for Autoregressive Protein Side-Chain Packing

The paper introduces DiffPack, an autoregressive torsional diffusion model for predicting the conformations of protein side-chains based on a given backbone structure. Protein side-chain packing (PSCP) is pivotal in protein structure prediction, crucial for various biological applications including enzyme design and drug discovery. Existing methods have struggled with computational intensity, accuracy, and inability to capture complex energy landscapes due to treating the problem as a regression in Cartesian coordinate space.

Methodological Advances

DiffPack shifts the focus from Cartesian coordinates to torsional space, efficiently modeling the degrees of freedom that are intrinsic to side-chain packing. The model addresses overparameterization issues present in prior methods by using the minimal representation of torsional angles. Specifically, DiffPack generates the four torsion angles (χ1\chi_1 to χ4\chi_4) autoregressively, using separate diffusion models for each, thereby accommodating rotational dependencies and mitigating the accumulation of coordinate displacement common in joint angle diffusion processes.

The model employs SE(3)-invariant networks for learning the gradient fields in torsional space, thereby capturing atom-level rotation information which is essential for accurate side-chain prediction. The training process enhances model robustness by introducing a separate score network for each torsional angle, employing a teacher-forcing strategy to minimize the cumulative perturbation effects.

To bolster the model's performance, DiffPack implements several sampling strategies: multi-round sampling, annealed temperature sampling, and confidence models. These techniques notably improve angle prediction accuracy and inference stability.

Empirical Performance

Empirical results underscore DiffPack's advantages over existing models such as AttnPacker and DLPacker. On benchmarks like CASP13 and CASP14, DiffPack achieved significant improvements in angle accuracy—11.9% for CASP13 and 13.5% for CASP14—while using a notably smaller model size (60 times fewer parameters). This reflects its computational efficiency and potential scalability. Moreover, its ability to enhance AlphaFold2's side-chain predictions further supports its complementary capabilities. Importantly, DiffPack consistently outperforms other methods even when applied to non-native backbones generated by AlphaFold2, emphasizing its robustness and adaptability.

Implications and Future Directions

The introduction of DiffPack marks a significant step in applying diffusion models to protein conformational prediction tasks. The ability to efficiently predict protein side-chains can facilitate advancements in drug design and protein engineering by providing more accurate structural models, which are crucial for understanding protein interactions at a molecular level.

Future research should explore extending DiffPack to handle variable backbone flexibility and implementing it in co-generation of sequences and side-chain conformations. Further integration with other structural prediction models could enhance their effectiveness and broaden DiffPack's applicability in computational biology. The promising gains in both efficiency and accuracy position DiffPack as a compelling tool for advancing protein science research and applications.

X Twitter Logo Streamline Icon: https://streamlinehq.com