Learning to design protein-protein interactions with enhanced generalization (2310.18515v3)
Abstract: Discovering mutations enhancing protein-protein interactions (PPIs) is critical for advancing biomedical research and developing improved therapeutics. While machine learning approaches have substantially advanced the field, they often struggle to generalize beyond training data in practical scenarios. The contributions of this work are three-fold. First, we construct PPIRef, the largest and non-redundant dataset of 3D protein-protein interactions, enabling effective large-scale learning. Second, we leverage the PPIRef dataset to pre-train PPIformer, a new SE(3)-equivariant model generalizing across diverse protein-binder variants. We fine-tune PPIformer to predict effects of mutations on protein-protein interactions via a thermodynamically motivated adjustment of the pre-training loss function. Finally, we demonstrate the enhanced generalization of our new PPIformer approach by outperforming other state-of-the-art methods on new, non-leaking splits of standard labeled PPI mutational data and independent case studies optimizing a human antibody against SARS-CoV-2 and increasing the thrombolytic activity of staphylokinase.
- Flex ddg: Rosetta ensemble-based estimation of changes in protein–protein binding affinity upon mutation. The Journal of Physical Chemistry B, 122(21):5389–5399, May 2018. ISSN 1520-6106. doi: 10.1021/acs.jpcb.7b11367. URL https://doi.org/10.1021/acs.jpcb.7b11367.
- The protein data bank. Nucleic acids research, 28(1):235–242, 2000.
- Global distribution of conformational states derived from redundant models in the pdb points to non-uniqueness of the protein structure. Proceedings of the National Academy of Sciences, 106(26):10505–10510, 2009. URL https://doi.org/10.1073/pnas.081215210.
- Pcalign: a method to quantify physicochemical similarity of protein-protein interfaces. BMC bioinformatics, 16(1):1–12, 2015.
- Robust deep learning–based protein sequence design using proteinmpnn. Science, 378(6615):49–56, 2022.
- Beatmusic: prediction of changes in protein–protein binding affinity on mutations. Nucleic acids research, 41(W1):W333–W339, 2013.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Prop3d: A flexible, python-based platform for machine learning with protein structural properties and biophysical data. bioRxiv, 2022. doi: 10.1101/2022.12.27.522071. URL https://www.biorxiv.org/content/early/2022/12/30/2022.12.27.522071.
- Protein complex prediction with alphafold-multimer. biorxiv, pp. 2021–10, 2021.
- William Falcon and The PyTorch Lightning team. PyTorch Lightning, mar 2019. URL https://github.com/Lightning-AI/lightning.
- World stroke organization (wso): global stroke fact sheet 2022. International Journal of Stroke, 17(1):18–29, 2022.
- Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428, 2019.
- Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nature Methods, 17(2):184–192, 2020.
- De novo design of protein interactions with learned surface fingerprints. Nature, pp. 1–9, 2023.
- Independent se (3)-equivariant models for end-to-end rigid protein docking. arXiv preprint arXiv:2111.07786, 2021.
- Mu Gao and Jeffrey Skolnick. ialign: a method for the structural comparison of protein–protein interfaces. Bioinformatics, 26(18):2259–2265, 2010a. URL https://doi.org/10.1093/bioinformatics/btq404.
- Mu Gao and Jeffrey Skolnick. Structural space of protein–protein interfaces is degenerate, close to complete, and highly connected. Proceedings of the National Academy of Sciences, 107(52):22517–22522, 2010b. URL 10.1073/pnas.1012820107.
- isee: Interface structure, evolution, and energy-based machine learning predictor of binding affinity changes upon mutations. Proteins: Structure, Function, and Bioinformatics, 87(2):110–119, 2019a.
- Finding the δ𝛿\deltaitalic_δδ𝛿\deltaitalic_δg spot: Are predictors of binding affinity changes upon mutations in protein–protein interactions ready for it? Wiley Interdisciplinary Reviews: Computational Molecular Science, 9(5):e1410, 2019b.
- Learning inverse folding from millions of predicted structures. bioRxiv, 2022. doi: 10.1101/2022.04.10.487779.
- Targeting protein–protein interactions as an anticancer strategy. Trends in pharmacological sciences, 34(7):393–400, 2013.
- Graphein-a python library for geometric deep learning and network analysis on protein structures and interaction networks. bioRxiv, pp. 2020–07, 2020.
- Skempi 2.0: an updated benchmark of changes in protein–protein binding energy, kinetics and thermodynamics upon mutation. Bioinformatics, 35(3):462–469, 2019.
- Unsupervised protein-ligand binding energy prediction via neural euler’s rotation equation. arXiv preprint arXiv:2301.10814, 2023.
- Learning from protein structure with geometric vector perceptrons. arXiv preprint arXiv:2009.01411, 2020.
- Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
- Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440, 2021.
- On the binding affinity of macromolecular interactions: daring to ask why proteins interact. Journal of The Royal Society Interface, 10(79):20120835, 2013.
- Diffdock-pp: Rigid protein-protein docking with diffusion models. arXiv preprint arXiv:2304.03889, 2023.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- De novo design of bioactive protein switches. Nature, 572(7768):205–210, 2019.
- Recombinant staphylokinase variants with reduced antigenicity due to elimination of b-lymphocyte epitopes. Blood, The Journal of the American Society of Hematology, 96(4):1425–1432, 2000.
- Macromolecular modeling and design in rosetta: recent methods and frameworks. Nature methods, 17(7):665–680, 2020.
- Equiformer: Equivariant graph attention transformer for 3d atomistic graphs. arXiv preprint arXiv:2206.11990, 2022.
- Equiformerv2: Improved equivariant transformer for scaling to higher-degree representations. arXiv preprint arXiv:2306.12059, 2023.
- Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637):1123–1130, 2023.
- Deep geometric representations for modeling effects of mutations on protein-protein binding affinity. PLoS computational biology, 17(8):e1009284, 2021.
- Recent advances in the development of protein–protein interactions modulators: mechanisms and clinical trials. Signal transduction and targeted therapy, 5(1):213, 2020.
- Rotamer density estimator is an unsupervised learner of the effect of mutations on protein-protein interaction. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=_X9Yl1K2mD.
- Computational design of novel protein–protein interactions–an overview on methodological approaches and applications. Current Opinion in Structural Biology, 74:102370, 2022.
- Language models enable zero-shot prediction of the effects of mutations on protein function. Advances in Neural Information Processing Systems, 34:29287–29303, 2021.
- Topology independent structural matching discovers novel templates for protein interfaces. Bioinformatics, 34(17):i787–i794, 2018. doi: 10.1093/bioinformatics/bty587.
- Dips-plus: The enhanced database of interacting protein structures for interface prediction. arXiv preprint arXiv:2106.04362, 2021.
- Computer-aided engineering of staphylokinase toward enhanced affinity and selectivity for plasmin. Computational and structural biotechnology journal, 20:1366–1377, 2022.
- Saambe-3d: predicting effect of mutations on protein–protein interactions. International journal of molecular sciences, 21(7):2563, 2020.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Msa transformer. In International Conference on Machine Learning, pp. 8844–8856. PMLR, 2021.
- Hhblits: lightning-fast iterative protein sequence searching by hmm-hmm alignment. Nature Methods, 9(2):173–175, Feb 2012. ISSN 1548-7105. doi: 10.1038/nmeth.1818. URL https://doi.org/10.1038/nmeth.1818.
- Calculation of accurate interatomic contact surface areas for the quantitative analysis of non-bonded molecular interactions. Bioinformatics, 35(18):3499–3501, 2019.
- Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 118(15):e2016239118, 2021.
- mmcsm-ppi: predicting the effects of multiple point mutations on protein–protein interactions. Nucleic Acids Research, 49(W1):W417–W424, 2021.
- Generalized extracellular molecule sensor platform for programming cellular behavior. Nature chemical biology, 14(7):723–729, 2018.
- The foldx web server: an online force field. Nucleic acids research, 33(suppl_2):W382–W388, 2005.
- Deep learning guided optimization of human antibody against sars-cov-2 variants with broad neutralization. Proceedings of the National Academy of Sciences, 119(11):e2122954119, 2022.
- Quantitative comparison of protein-protein interaction interface using physicochemical feature-based descriptors of surface patches. Frontiers in Molecular Biosciences, 10, 2023a. URL https://doi.org/10.3389/fmolb.2023.1110567.
- Quantitative comparison of protein-protein interaction interface using physicochemical feature-based descriptors of surface patches. Frontiers in Molecular Biosciences, 10:1110567, 2023b.
- A structure-based deep learning framework for protein engineering. bioRxiv, pp. 833905, 2019.
- Rosettaddgprediction for high-throughput mutational scans: From stability to binding. Protein Science, 32(1):e4527, 2023. doi: https://doi.org/10.1002/pro.4527. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/pro.4527.
- Mmseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature biotechnology, 35(11):1026–1028, 2017.
- Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826, 2016.
- End-to-end learning on 3d protein structure for interface prediction. Advances in Neural Information Processing Systems, 32, 2019. URL https://doi.org/10.48550/arXiv.1807.01297.
- Fast and accurate protein structure search with foldseek. Nature Biotechnology, pp. 1–4, 2023. doi: https://doi.org/10.1038/s41587-023-01773-0.
- Updates to the integrated protein–protein interaction benchmarks: docking benchmark version 5 and affinity benchmark version 2. Journal of molecular biology, 427(19):3031–3041, 2015.
- A topology-based network tree for the prediction of protein–protein binding affinity changes following mutation. Nature Machine Intelligence, 2(2):116–123, 2020.
- De novo design of protein structure and function with rfdiffusion. Nature, pp. 1–3, 2023.
- Bindprofx: assessing mutation-induced binding affinity change by protein interface profiles with pseudo-counts. Journal of molecular biology, 429(3):426–434, 2017.
- Se (3) diffusion model with application to protein backbone generation. arXiv preprint arXiv:2302.02277, 2023.
- Scoring function for automated assessment of protein structure template quality. Proteins: Structure, Function, and Bioinformatics, 57(4):702–710, 2004.
- Enhancing protein language models with structure-based encoder and pre-training. arXiv preprint arXiv:2303.06275, 2023.
- Anton Bushuiev (4 papers)
- Roman Bushuiev (4 papers)
- Petr Kouba (2 papers)
- Anatolii Filkin (1 paper)
- Marketa Gabrielova (1 paper)
- Michal Gabriel (1 paper)
- Jiri Sedlar (10 papers)
- Tomas Pluskal (4 papers)
- Jiri Damborsky (3 papers)
- Stanislav Mazurenko (7 papers)
- Josef Sivic (78 papers)