DiffPack: A Torsional Diffusion Model for Autoregressive Protein Side-Chain Packing (2306.01794v2)
Abstract: Proteins play a critical role in carrying out biological functions, and their 3D structures are essential in determining their functions. Accurately predicting the conformation of protein side-chains given their backbones is important for applications in protein structure prediction, design and protein-protein interactions. Traditional methods are computationally intensive and have limited accuracy, while existing machine learning methods treat the problem as a regression task and overlook the restrictions imposed by the constant covalent bond lengths and angles. In this work, we present DiffPack, a torsional diffusion model that learns the joint distribution of side-chain torsional angles, the only degrees of freedom in side-chain packing, by diffusing and denoising on the torsional space. To avoid issues arising from simultaneous perturbation of all four torsional angles, we propose autoregressively generating the four torsional angles from $\chi_1$ to $\chi_4$ and training diffusion models for each torsional angle. We evaluate the method on several benchmarks for protein side-chain packing and show that our method achieves improvements of $11.9\%$ and $13.5\%$ in angle accuracy on CASP13 and CASP14, respectively, with a significantly smaller model size ($60\times$ fewer parameters). Additionally, we show the effectiveness of our method in enhancing side-chain predictions in the AlphaFold2 model. Code is available at https://github.com/DeepGraphLearning/DiffPack.
- The rosetta all-atom energy function for macromolecular modeling and design. bioRxiv, 2017.
- Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. arXiv preprint arXiv:2205.15019, 2022.
- Computational reconstruction of atomistic protein structures from coarse-grained models. Computational and Structural Biotechnology Journal, 18:162 – 176, 2019.
- Accurate prediction of protein structures and interactions using a 3-track neural network. Science (New York, N.Y.), 373:871 – 876, 2021.
- Graphqa: protein model quality assessment using graph convolutional networks. Bioinformatics, 37:360 – 366, 2020.
- A protein-dependent side-chain rotamer library. In BMC bioinformatics, volume 12, pages 1–12. Springer, 2011.
- Improved side-chain modeling by coupling clash-detection guided iterative search with rotamer relaxation. Bioinformatics, 27 6:785–90, 2011.
- Pyrosetta: a script-based interface for implementing molecular modeling algorithms using rosetta. Bioinformatics, 26(5):689–691, 2010.
- Graph networks as a universal machine learning framework for molecules and crystals. Chemistry of Materials, 31(9):3564–3572, 2019.
- Wavegrad: Estimating gradients for waveform generation. arXiv preprint arXiv:2009.00713, 2020.
- The use of position-specific rotamers in model building by homology. Proteins: Structure, 23, 1995.
- Diffdock: Diffusion steps, twists, and turns for molecular docking. International Conference on Learning Representations (ICLR), 2023.
- Protein interaction interface region prediction by geometric deep learning. Bioinformatics, 2021.
- Side-chain and backbone flexibility in protein core design. Journal of molecular biology, 290 1:305–18, 1999.
- 3.13 computational methods related to molecular structure and reaction chemistry of biomaterials. 2017.
- Protein contacts, inter-residue interactions and side-chain modelling. Biochimie, 90 4:626–39, 2008.
- Se(3)-transformers: 3d roto-translation equivariant attention networks. ArXiv, abs/2006.10503, 2020.
- Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nature Methods, 17(2):184–192, 2020.
- Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263–1272. PMLR, 2017.
- Structure-based protein function prediction using graph convolutional networks. Nature Communications, 12, 2021.
- Contrastive representation learning for 3d protein structures. ArXiv, abs/2205.15675, 2022.
- Intrinsic-extrinsic convolution and pooling for learning on 3d protein structures. International Conference on Learning Representations, 2021.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Propose: Direct exhaustive protein-protein docking with side chain flexibility. Journal of chemical theory and computation, 14 9:4938–4947, 2018.
- Autoregressive diffusion models. In International Conference on Learning Representations, 2022a. URL https://openreview.net/forum?id=Lm8T39vLDTE.
- Equivariant diffusion for molecule generation in 3d. In International Conference on Machine Learning, pages 8867–8887, 2022b.
- Faspr: an open-source tool for fast and accurate protein side-chain packing. Bioinformatics, 36(12):3758–3765, 2020.
- Illuminating protein space with a programmable generative model. bioRxiv, pages 2022–12, 2022.
- Learning from protein structure with geometric vector perceptrons. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=1YLJDvSx6J4.
- Torsional diffusion for molecular conformer generation. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=w6fj2r62r_H.
- Neural message passing with edge updates for predicting properties of molecules and materials. arXiv preprint arXiv:1806.03146, 2018.
- Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
- Directional message passing for molecular graphs. In International Conference on Learning Representations (ICLR), 2020.
- Gemnet: Universal directional graph neural networks for molecules. arXiv preprint arXiv:2106.08903, 2021.
- Improved prediction of protein side-chain conformations with scwrl4. Proteins: Structure, Function, and Bioinformatics, 77(4):778–795, 2009.
- A set of van der waals and coulombic radii of protein atoms for molecular and solvent-accessible surface calculation, packing evaluation, and docking. Proteins: Structure, Function, and Bioinformatics, 32(1):111–127, 1998.
- Fast and accurate prediction of protein side-chain conformations. Bioinformatics, 27:2913 – 2914, 2011.
- Generating novel, designable, and diverse protein structures by equivariantly diffusing oriented residue clouds. ArXiv, abs/2301.12485, 2023.
- Prediction of amino acid side chain conformation using a deep neural network. ArXiv, abs/1707.08381, 2017.
- Molecular geometry pretraining with SE(3)-invariant denoising distance matching. In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=CjTHVo1dvR.
- Spherical message passing for 3d graph networks. arXiv preprint arXiv:2102.05013, 2021.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022a.
- Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2022b.
- Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures. 2022. URL https://openreview.net/forum?id=jSorGn2Tjg.
- Attnpacker: An end-to-end deep learning method for rotamer-free protein side-chain packing. bioRxiv, pages 2022–03, 2022.
- Dlpacker: deep learning for prediction of amino acid side chain conformations in proteins. Proteins: Structure, Function, and Bioinformatics, 90(6):1278–1290, 2022.
- Sidepro: A novel machine learning approach for the fast and accurate prediction of side-chain conformations. Proteins: Structure, 80, 2012.
- Coupling protein side-chain and backbone flexibility improves the re-design of protein-ligand specificity. PLoS Computational Biology, 11, 2015.
- E(n) equivariant graph neural networks. In International Conference on Machine Learning, 2021.
- Modeling relational data with graph convolutional networks. In The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15, pages 593–607. Springer, 2018.
- Quantum-chemical insights from deep tensor neural networks. Nature communications, 8(1):1–8, 2017a.
- Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. arXiv preprint arXiv:1706.08566, 2017b.
- A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure, 19(6):844–858, 2011.
- A structural homology approach for computational protein design with flexible backbone. Bioinformatics, 2018.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265, 2015.
- Multi-scale representation learning on proteins. In Neural Information Processing Systems, 2022.
- Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=PxTIG12RRHS.
- Fast end-to-end learning on protein surfaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15272–15281, 2021.
- Tensor field networks: Rotation- and translation-equivariant neural networks for 3d point clouds. ArXiv, abs/1802.08219, 2018.
- Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=6TxBxqNME1Y.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Deep graph library: A graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315, 2019.
- Protein secondary structure prediction using deep convolutional neural fields. Scientific reports, 6(1):1–11, 2016.
- Side-chain conformational preferences govern protein-protein interactions. Journal of the American Chemical Society, 138 33:10386–9, 2016a.
- Rotamer libraries for the high-resolution design of β𝛽\betaitalic_β-amino acid foldamers. bioRxiv, 2016b.
- Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models. bioRxiv, 2022.
- Protein structure generation via folding diffusion. arXiv preprint arXiv:2209.15611, 2022a.
- Diffusion-based molecule generation with informative prior bridges. arXiv preprint arXiv:2209.00865, 2022b.
- Opus-rota3: Improving protein side-chain modeling by deep neural networks and ensemble methods. Journal of chemical information and modeling, 2020.
- Opus-rota4: a gradient-based protein side-chain modeling framework assisted by deep learning-based predictors. Briefings in Bioinformatics, 23, 2021.
- Fast and accurate algorithms for protein side-chain packing. J. ACM, 53:533–557, 2006.
- Geodiff: A geometric diffusion model for molecular conformation generation. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=PzcvxEMzvQC.
- Minimizing and learning energy functions for side-chain prediction. Journal of computational biology : a journal of computational molecular cell biology, 15 7:899–911, 2007.
- Se(3) diffusion model with application to protein backbone generation. ArXiv, abs/2302.02277, 2023.
- Protein representation learning by geometric structure pretraining. In International Conference on Learning Representations, 2023a.
- Physics-inspired protein encoder pre-training via siamese sequence-structure diffusion trajectory prediction. ArXiv, abs/2301.12068, 2023b.