Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficiently Predicting Protein Stability Changes Upon Single-point Mutation with Large Language Models (2312.04019v1)

Published 7 Dec 2023 in q-bio.BM and cs.AI

Abstract: Predicting protein stability changes induced by single-point mutations has been a persistent challenge over the years, attracting immense interest from numerous researchers. The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry, including drug development, protein evolution analysis, and enzyme synthesis. Despite the proposition of multiple methodologies aimed at addressing this issue, few approaches have successfully achieved optimal performance coupled with high computational efficiency. Two principal hurdles contribute to the existing challenges in this domain. The first is the complexity of extracting and aggregating sufficiently representative features from proteins. The second refers to the limited availability of experimental data for protein mutation analysis, further complicating the comprehensive evaluation of model performance on unseen data samples. With the advent of LLMs(LLM), such as the ESM models in protein research, profound interpretation of protein features is now accessibly aided by enormous training data. Therefore, LLMs are indeed to facilitate a wide range of protein research. In our study, we introduce an ESM-assisted efficient approach that integrates protein sequence and structural features to predict the thermostability changes in protein upon single-point mutations. Furthermore, we have curated a dataset meticulously designed to preclude data leakage, corresponding to two extensively employed test datasets, to facilitate a more equitable model comparison.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. Molecular dynamics and protein function. Proceedings of the National Academy of Sciences, 102(19):6679–6685, 2005.
  2. Immune-responsive gene 1 protein links metabolism to immunity by catalyzing itaconic acid production. Proceedings of the National Academy of Sciences, 110(19):7820–7825, 2013.
  3. Adp-ribosylation of membrane proteins catalyzed by cholera toxin: basis of the activation of adenylate cyclase. Proceedings of the National Academy of Sciences, 75(7):3050–3054, 1978.
  4. Replisome-mediated dna replication. Annual review of biochemistry, 70(1):181–208, 2001.
  5. Mechanical stimuli of skeletal muscle: implications on mtor/p70s6k and protein synthesis. European journal of applied physiology, 102:253–263, 2008.
  6. Nonlinear structured-illumination microscopy with a photoswitchable protein reveals cellular structures at 50-nm resolution. Proceedings of the National Academy of Sciences, 109(3):E135–E143, 2012.
  7. Thomas Zeuthen. Water-transporting proteins. Journal of Membrane Biology, 234:57–73, 2010.
  8. Protein language models and structure prediction: Connection and progression, 2022.
  9. Paul J Carter. Introduction to current and future protein therapeutics: a protein engineering perspective. Experimental cell research, 317(9):1261–1269, 2011.
  10. Vqpl: Vector quantized protein language. arXiv preprint arXiv:2310.04985, 2023.
  11. Kevin M Ulmer. Protein engineering. Science, 219(4585):666–671, 1983.
  12. Expressed protein ligation: a general method for protein engineering. Proceedings of the National Academy of Sciences, 95(12):6705–6710, 1998.
  13. Protein engineering handbook. John Wiley & Sons, 2012.
  14. Pifold: Toward effective and efficient protein inverse folding. arXiv preprint arXiv:2209.12643, 2022.
  15. Global-context aware generative protein design. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.
  16. Generative models for graph-based protein design. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32, pages 15820–15831. Curran Associates, Inc., 2019.
  17. Learning from protein structure with geometric vector perceptrons. In International Conference on Learning Representations, 2021.
  18. Alphadesign: A graph protein design method and benchmark on alphafolddb. arXiv preprint arXiv:2202.01079, 2022.
  19. Rfold: Towards simple yet effective rna secondary structure prediction. arXiv preprint arXiv:2212.14041, 2022.
  20. Fast and flexible protein design using deep graph neural networks. Cell Systems, 11(4):402–411.e4, 2020.
  21. Fold2seq: A joint sequence(1d)-fold(3d) embedding-based generative model for protein design. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 1261–1271. PMLR, 18–24 Jul 2021.
  22. Hierarchical data-efficient representation learning for tertiary structure-based rna design, 2023.
  23. Knowledge-design: Pushing the limit of protein deign via knowledge refinement. arXiv preprint arXiv:2305.15151, 2023.
  24. Junction tree variational autoencoder for molecular graph generation. In International conference on machine learning, pages 2323–2332. PMLR, 2018.
  25. Target-aware molecular graph generation. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 410–427. Springer, 2023.
  26. Co-supervised pre-training of pocket and ligand. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 405–421. Springer, 2023.
  27. E3bind: An end-to-end equivariant network for protein-ligand docking. arXiv preprint arXiv:2210.06069, 2022.
  28. Cross-gate mlp with protein complex invariant embedding is a one-shot antibody designer. arXiv e-prints, pages arXiv–2305, 2023.
  29. Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling, 61(7):3273–3284, 2021.
  30. Learning graph models for retrosynthesis prediction. Advances in Neural Information Processing Systems, 34:9405–9415, 2021.
  31. Semiretro: Semi-template framework boosts deep retrosynthesis prediction. arXiv preprint arXiv:2202.08205, 2022.
  32. Motifretro: Exploring the combinability-consistency trade-offs in retrosynthesis via dynamic motif editing. arXiv preprint arXiv:2305.15153, 2023.
  33. Stability effects of mutations and protein evolvability. Current opinion in structural biology, 19(5):596–604, 2009.
  34. Predicted effects of missense mutations on native-state stability account for phenotypic outcome in phenylketonuria, a paradigm of misfolding diseases. The American Journal of Human Genetics, 81(5):1006–1024, 2007.
  35. Calculating structures and free energies of complex molecules: combining molecular mechanics and continuum models. Accounts of chemical research, 33(12):889–897, 2000.
  36. Exhaustive mutagenesis in silico: multicoordinate free energy calculations on proteins and peptides. Proteins: Structure, Function, and Bioinformatics, 41(3):385–397, 2000.
  37. Statistical potentials extracted from protein structures: how accurate are they? Journal of molecular biology, 257(2):457–469, 1996.
  38. Four-body potentials reveal protein-specific correlations to stability changes caused by hydrophobic core mutations. Journal of molecular biology, 311(4):625–638, 2001.
  39. Prediction of the stability of protein mutants based on structural environment-dependent amino acid substitution and propensity tables. Protein engineering, 10(1):7–21, 1997.
  40. Prediction of stability changes upon single-site mutations using database-derived potentials. Theoretical Chemistry Accounts, 101:46–50, 1999.
  41. AJ Bordner and RA Abagyan. Large-scale prediction of protein geometry and stability changes for arbitrary single point mutations. Proteins: Structure, Function, and Bioinformatics, 57(2):400–413, 2004.
  42. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. Journal of molecular biology, 320(2):369–387, 2002.
  43. A neural-network-based method for predicting protein stability changes upon single point mutations. Bioinformatics, 20(suppl_1):i63–i68, 2004.
  44. A survey on generative diffusion model. arXiv preprint arXiv:2209.02646, 2022.
  45. Prediction of protein stability changes for single-site mutations using support vector machines. Proteins: Structure, Function, and Bioinformatics, 62(4):1125–1132, 2006.
  46. Sequence analysis and rule development of predicting protein stability change upon mutation using decision tree model. Journal of Molecular modeling, 13:879–890, 2007.
  47. Inps: predicting the impact of non-synonymous variations on protein stability from sequence. Bioinformatics, 31(17):2816–2821, 2015.
  48. Ease-mm: sequence-based prediction of mutation-induced stability changes with feature-based multiple models. Journal of molecular biology, 428(6):1394–1405, 2016.
  49. Prostata: Protein stability assessment using transformers. bioRxiv, pages 2022–12, 2022.
  50. Ddgun: an untrained method for the prediction of protein stability changes upon single and multiple point variations. BMC bioinformatics, 20:1–10, 2019.
  51. A deep-learning sequence-based method to predict protein stability changes upon genetic variations. Genes, 12(6):911, 2021.
  52. Saafec-seq: a sequence-based method for predicting the effect of single point mutations on protein thermodynamic stability. International journal of molecular sciences, 22(2):606, 2021.
  53. istable 2.0: predicting protein thermal stability changes by integrating various characteristic modules. Computational and structural biotechnology journal, 18:622–630, 2020.
  54. Duet: a server for predicting effects of mutations on protein stability using an integrated computational approach. Nucleic acids research, 42(W1):W314–W319, 2014.
  55. Premps: Predicting the impact of missense mutations on protein stability. PLoS computational biology, 16(12):e1008543, 2020.
  56. Predicting changes in protein thermodynamic stability upon point mutation with deep 3d convolutional neural networks. PLoS computational biology, 16(11):e1008291, 2020.
  57. Dynamut: predicting the impact of mutations on protein conformation, flexibility and stability. Nucleic acids research, 46(W1):W350–W355, 2018.
  58. Inps-md: a web server to predict stability of protein variants from sequence and structure. Bioinformatics, 32(16):2542–2544, 2016.
  59. Foldx as protein engineering tool: better than random based approaches? Computational and structural biotechnology journal, 16:25–33, 2018.
  60. Popmusic 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC bioinformatics, 12(1):1–12, 2011.
  61. Maestro-multi agent stability prediction upon point mutations. BMC bioinformatics, 16(1):1–13, 2015.
  62. Stability oracle: A structure-based graph-transformer for identifying stabilizing mutations. bioRxiv, pages 2023–05, 2023.
  63. Flex ddg: Rosetta ensemble-based estimation of changes in protein–protein binding affinity upon mutation. The Journal of Physical Chemistry B, 122(21):5389–5399, 2018.
  64. Pros-gnn: Predicting effects of mutations on protein stability using graph neural networks. Computational Biology and Chemistry, 107:107952, 2023.
  65. Scones: self-consistent neural network for protein stability prediction upon mutation. The Journal of Physical Chemistry B, 125(38):10657–10671, 2021.
  66. Rosetta: A computer program for estimating soil hydraulic parameters with hierarchical pedotransfer functions. Journal of hydrology, 251(3-4):163–176, 2001.
  67. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
  68. Predicting protein stability changes upon single-point mutation: a thorough comparison of the available tools on a new dataset. Briefings in Bioinformatics, 23(2):bbab555, 2022.
  69. Quantification of biases in predictions of protein stability changes upon mutations. Bioinformatics, 34(21):3659–3665, 2018.
  70. Tm-align: a protein structure alignment algorithm based on the tm-score. Nucleic acids research, 33(7):2302–2309, 2005.
  71. Continuous-discrete convolution for geometry-sequence modeling in proteins. In The Eleventh International Conference on Learning Representations, 2022.
  72. Jorge A Vila. Proteins’ evolution upon point mutations. ACS omega, 7(16):14371–14376, 2022.
  73. Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428, 2019.
Citations (2)

Summary

We haven't generated a summary for this paper yet.