Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning to Predict Mutation Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning (2405.10348v1)

Published 16 May 2024 in q-bio.QM, cs.AI, and cs.LG

Abstract: Protein-protein bindings play a key role in a variety of fundamental biological processes, and thus predicting the effects of amino acid mutations on protein-protein binding is crucial. To tackle the scarcity of annotated mutation data, pre-training with massive unlabeled data has emerged as a promising solution. However, this process faces a series of challenges: (1) complex higher-order dependencies among multiple (more than paired) structural scales have not yet been fully captured; (2) it is rarely explored how mutations alter the local conformation of the surrounding microenvironment; (3) pre-training is costly, both in data size and computational burden. In this paper, we first construct a hierarchical prompt codebook to record common microenvironmental patterns at different structural scales independently. Then, we develop a novel codebook pre-training task, namely masked microenvironment modeling, to model the joint distribution of each mutation with their residue types, angular statistics, and local conformational changes in the microenvironment. With the constructed prompt codebook, we encode the microenvironment around each mutation into multiple hierarchical prompts and combine them to flexibly provide information to wild-type and mutated protein complexes about their microenvironmental differences. Such a hierarchical prompt learning framework has demonstrated superior performance and training efficiency over state-of-the-art pre-training-based methods in mutation effect prediction and a case study of optimizing human antibodies against SARS-CoV-2.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. The rosetta all-atom energy function for macromolecular modeling and design. Journal of chemical theory and computation, 13(6):3031–3048, 2017.
  2. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
  3. Foldx 5.0: working with rna, small molecules and a new graphical interface. Bioinformatics, 35(20):4168–4169, 2019.
  4. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural networks, 107:3–11, 2018.
  5. Disease variant prediction with deep generative models of evolutionary data. Nature, 599(7883):91–95, 2021.
  6. Proteininvbench: Benchmarking protein inverse folding on diverse tasks, models, and metrics. Advances in Neural Information Processing Systems, 36, 2024.
  7. isee: Interface structure, evolution, and energy-based machine learning predictor of binding affinity changes upon mutations. Proteins: Structure, Function, and Bioinformatics, 87(2):110–119, 2019.
  8. Learning inverse folding from millions of predicted structures. In International Conference on Machine Learning, pp.  8946–8970. PMLR, 2022.
  9. A survey on computational models for predicting protein–protein interactions. Briefings in bioinformatics, 22(5):bbab036, 2021.
  10. Data-efficient protein 3d geometric pretraining via refinement of diffused protein structure decoy. arXiv preprint arXiv:2302.10888, 2023.
  11. Protein 3d graph structure learning for robust structure-based protein property prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp.  12662–12670, 2024.
  12. Huber, P. J. Robust estimation of a location parameter. In Breakthroughs in statistics: Methodology and distribution, pp.  492–518. Springer, 1992.
  13. Skempi 2.0: an updated benchmark of changes in protein–protein binding energy, kinetics and thermodynamics upon mutation. Bioinformatics, 35(3):462–469, 2019.
  14. The pdb_redo server for macromolecular structure model optimization. IUCrJ, 1(4):213–220, 2014.
  15. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
  16. On the binding affinity of macromolecular interactions: daring to ask why proteins interact. Journal of The Royal Society Interface, 10(79):20120835, 2013.
  17. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  18. Conditional antibody design as 3d equivariant graph translation. arXiv preprint arXiv:2208.06073, 2022.
  19. Mutational fitness landscape of human influenza h3n2 neuraminidase. Cell reports, 42(1), 2023.
  20. Mutabind estimates and interprets the effects of sequence variants on protein–protein interactions. Nucleic acids research, 44(W1):W494–W501, 2016.
  21. Functional-group-based diffusion for pocket-specific molecule generation and elaboration. Advances in Neural Information Processing Systems, 36, 2024.
  22. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637):1123–1130, 2023.
  23. Predicting mutational effects on protein-protein binding via a side-chain diffusion probabilistic model. arXiv preprint arXiv:2310.19849, 2023.
  24. Recent advances in the development of protein–protein interactions modulators: mechanisms and clinical trials. Signal transduction and targeted therapy, 5(1):213, 2020.
  25. Rotamer density estimator is an unsupervised learner of the effect of mutations on protein-protein interaction. bioRxiv, pp.  2023–02, 2023.
  26. Ecnet is an evolutionary context-integrated deep learning framework for protein engineering. Nature communications, 12(1):5743, 2021.
  27. Computational design of novel protein–protein interactions–an overview on methodological approaches and applications. Current Opinion in Structural Biology, 74:102370, 2022.
  28. Language models enable zero-shot prediction of the effects of mutations on protein function. Advances in Neural Information Processing Systems, 34:29287–29303, 2021.
  29. Janeway’s immunobiology. Garland science, 2016.
  30. Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval. In International Conference on Machine Learning, pp.  16990–17017. PMLR, 2022.
  31. Simultaneous optimization of biomolecular energy functions on features from small molecules and macromolecules. Journal of chemical theory and computation, 12(12):6201–6212, 2016.
  32. Msa transformer. In International Conference on Machine Learning, pp.  8844–8856. PMLR, 2021.
  33. The foldx web server: an online force field. Nucleic acids research, 33(suppl_2):W382–W388, 2005.
  34. Deep learning guided optimization of human antibody against sars-cov-2 variants with broad neutralization. Proceedings of the National Academy of Sciences, 119(11):e2122954119, 2022.
  35. Cross-gate mlp with protein complex invariant embedding is a one-shot antibody designer. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp.  15222–15230, 2024.
  36. Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
  37. Multi-level protein structure pre-training via prompt learning. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=XGagtiJ8XC.
  38. Psc-cpi: Multi-scale protein sequence-structure contrasting for efficient and generalizable compound-protein interaction prediction. arXiv preprint arXiv:2402.08198, 2024a.
  39. MAPE-PPI: Towards effective and efficient protein-protein interaction prediction via microenvironment-aware protein embedding. In The Twelfth International Conference on Learning Representations, 2024b. URL https://openreview.net/forum?id=itGkF993gz.
  40. Graph-based prediction of protein-protein interactions with attributed signed graph embedding. BMC bioinformatics, 21(1):1–16, 2020.
  41. Masked inverse folding with sequence transfer for protein representation learning. bioRxiv, 2022.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com