Papers
Topics
Authors
Recent
2000 character limit reached

PTransIPs: Identification of phosphorylation sites enhanced by protein PLM embeddings (2308.05115v3)

Published 8 Aug 2023 in q-bio.QM and cs.LG

Abstract: Phosphorylation is pivotal in numerous fundamental cellular processes and plays a significant role in the onset and progression of various diseases. The accurate identification of these phosphorylation sites is crucial for unraveling the molecular mechanisms within cells and during viral infections, potentially leading to the discovery of novel therapeutic targets. In this study, we develop PTransIPs, a new deep learning framework for the identification of phosphorylation sites. Independent testing results demonstrate that PTransIPs outperforms existing state-of-the-art (SOTA) methods, achieving AUCs of 0.9232 and 0.9660 for the identification of phosphorylated S/T and Y sites, respectively. PTransIPs contributes from three aspects. 1) PTransIPs is the first to apply protein pre-trained LLM (PLM) embeddings to this task. It utilizes ProtTrans and EMBER2 to extract sequence and structure embeddings, respectively, as additional inputs into the model, effectively addressing issues of dataset size and overfitting, thus enhancing model performance; 2) PTransIPs is based on Transformer architecture, optimized through the integration of convolutional neural networks and TIM loss function, providing practical insights for model design and training; 3) The encoding of amino acids in PTransIPs enables it to serve as a universal framework for other peptide bioactivity tasks, with its excellent performance shown in extended experiments of this paper. Our code, data and models are publicly available at https://github.com/StatXzy7/PTransIPs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. A. Trewavas, “Post-translational modification of proteins by phosphorylation,” Annu. Rev. Plant Physiol., vol. 27, pp. 349–374, 1976.
  2. A. Oliveira and U. Sauer, “The importance of post-translational modifications in regulating saccharomyces cerevisiae metabolism,” FEMS Yeast Res., vol. 12, pp. 104–117, 2012.
  3. J. D. Graves and E. G. Krebs, “Protein phosphorylation and signal transduction,” Pharmacology & therapeutics, vol. 82, no. 2-3, pp. 111–121, 1999.
  4. V. K. Mootha, C. Handschin, D. Arlow, X. Xie, J. St. Pierre, S. Sihag, W. Yang, D. Altshuler, P. Puigserver, N. Patterson, et al., “Errα𝛼\alphaitalic_α and gabpa/b specify pgc-1α𝛼\alphaitalic_α-dependent oxidative phosphorylation gene expression that is altered in diabetic muscle,” Proceedings of the National Academy of Sciences, vol. 101, no. 17, pp. 6570–6575, 2004.
  5. D. J. Lew and S. Kornbluth, “Regulatory roles of cyclin dependent kinase phosphorylation in cell cycle control,” Current opinion in cell biology, vol. 8, no. 6, pp. 795–804, 1996.
  6. K. Klann, D. Bojkova, G. Tascher, S. Ciesek, C. Münch, and J. Cinatl, “Growth factor receptor signaling inhibition prevents sars-cov-2 replication,” Mol. Cell, vol. 80, pp. 164–174.e4, 2020.
  7. C. Barnes, C. Jette, M. Abernathy, et al., “Sars-cov-2 neutralizing antibody structures inform therapeutic strategies,” Nature, vol. 588, pp. 682–687, 2020.
  8. J. M. Wolf, L. M. Wolf, G. L. Bello, J. G. Maccari, and L. A. Nasi, “Molecular evolution of sars-cov-2 from december 2019 to august 2022,” Journal of Medical Virology, vol. 95, no. 1, p. e28366, 2023.
  9. O. Tutsoy, K. Balikci, and N. F. Ozdil, “Unknown uncertainties in the covid-19 pandemic: Multi-dimensional identification and mathematical modelling for the analysis and estimation of the casualties,” Digital Signal Processing, vol. 114, p. 103058, 2021.
  10. O. Tutsoy and A. Polat, “Linear and non-linear dynamics of the epidemics: System identification based parametric prediction models for the pandemic outbreaks,” ISA transactions, vol. 124, pp. 90–102, 2022.
  11. T. Acter, N. Uddin, J. Das, A. Akhter, T. Choudhury, and S. Kim, “Evolution of severe acute respiratory syndrome coronavirus 2 (sars-cov-2) as coronavirus disease 2019 (covid-19) pandemic: a global health emergency,” Sci. Total Environ., vol. 730, p. 138996, 2020.
  12. K. Tugaeva, D. Hawkins, J. Smith, O. Bayfield, D.-S. Ker, A. Sysoev, O. Klychnikov, A. Antson, and N. Sluchanko, “The mechanism of sars-cov-2 nucleocapsid protein recognition by the human 14-3-3 proteins,” J. Mol. Biol., vol. 433, p. 166875, 2021.
  13. A. Eisenreichova and E. Boura, “Structural basis for sars-cov-2 nucleocapsid (n) protein recognition by 14-3-3 proteins,” J. Struct. Biol., vol. 214, p. 107879, 2022.
  14. D. Patel, K. Hausman, M. Arba, A. Tran, P. Lakernick, and C. Wu, “Novel inhibitors to adp ribose phosphatase of sars-cov-2 identified by structure-based high throughput virtual screening and molecular dynamics simulations,” Comput. Biol. Med., vol. 140, p. 105084, 2021.
  15. J. X. Huang, G. Lee, K. E. Cavanaugh, J. W. Chang, M. L. Gardel, and R. E. Moellering, “High throughput discovery of functional protein modifications by hotspot thermal profiling,” Nature methods, vol. 16, no. 9, pp. 894–901, 2019.
  16. R. Hekman, A. Hume, R. Goel, et al., “Actionable cytopathogenic host responses of human alveolar type 2 cells to sars-cov-2,” Mol Cell, vol. 80, pp. 1104–1122 e1109, 2020.
  17. L. Wei, P. Xing, J. Tang, and Q. Zou, “Phospred-rf: a novel sequence-based predictor for phosphorylation sites using sequential information only,” IEEE transactions on nanobioscience, vol. 16, no. 4, pp. 240–247, 2017.
  18. F. Li, C. Li, T. Marquez-Lago, et al., “Quokka: a comprehensive tool for rapid and accurate prediction of kinase family-specific phosphorylation sites in the human proteome,” Bioinformatics, vol. 34, pp. 4223–4231, 2018.
  19. C. Wang, H. Xu, S. Lin, et al., “Gps 5.0: An update on the prediction of kinase-specific phosphorylation sites in proteins,” Genomics Proteomics Bioinformatics, vol. 18, pp. 72–80, 2020.
  20. F. Luo, M. Wang, Y. Liu, et al., “Deepphos: prediction of protein phosphorylation sites with deep learning,” Bioinformatics, vol. 35, pp. 2766–2773, 2019.
  21. D. Wang, Y. Liang, and D. Xu, “Capsule network for protein post-translational modification site prediction,” Bioinformatics, vol. 35, pp. 2386–2394, 2019.
  22. D. Wang, S. Zeng, C. Xu, et al., “Musitedeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction,” Bioinformatics, vol. 33, pp. 3909–3916, 2017.
  23. D. Wang, D. Liu, J. Yuchi, et al., “Musitedeep: a deep-learning based webserver for protein post-translational modification site prediction and visualization,” Nucleic Acids Research, vol. 48, pp. W140–W146, 2020.
  24. L. Guo, Y. Wang, X. Xu, et al., “Deeppsp: a global-local information-based deep neural network for the prediction of protein phosphorylation sites,” J Proteome Res, vol. 20, pp. 346–356, 2021.
  25. H. Lv, F.-Y. Dao, H. Zulfiqar, and H. Lin, “Deepips: comprehensive assessment and computational identification of phosphorylation sites of sars-cov-2 infection using a deep learning-based approach,” Briefings in Bioinformatics, vol. 22, no. 6, p. bbab244, 2021.
  26. M. Wang, L. Yan, J. Jia, J. Lai, H. Zhou, and B. Yu, “De-mhaips: Identification of sars-cov-2 phosphorylation sites based on differential evolution multi-feature learning and multi-head attention mechanism,” Computers in Biology and Medicine, vol. 160, p. 106935, 2023.
  27. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019 (J. Burstein, C. Doran, and T. Solorio, eds.), vol. 1, (Minneapolis, MN, USA), pp. 4171–4186, Association for Computational Linguistics, 2019. Long and Short Papers.
  28. J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek, A. Potapenko, et al., “Highly accurate protein structure prediction with alphafold,” Nature, vol. 596, no. 7873, pp. 583–589, 2021.
  29. Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, A. dos Santos Costa, M. Fazel-Zarandi, T. Sercu, S. Candido, et al., “Language models of protein sequences at the scale of evolution enable accurate structure prediction,” BioRxiv, vol. 2022, p. 500902, 2022.
  30. A. Elnaggar, M. Heinzinger, C. Dallago, G. Rehawi, W. Yu, L. Jones, T. Gibbs, T. Feher, C. Angerer, M. Steinegger, D. Bhowmik, and B. Rost, “Prottrans: Towards cracking the language of lifes code through self-supervised deep learning and high performance computing,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2021.
  31. K. Weißenow, M. Heinzinger, and B. Rost, “Protein language-model embeddings for fast, accurate, and alignment-free protein structure prediction,” Structure, vol. 30, no. 8, pp. 1169–1177, 2022.
  32. A. Stukalov, V. Girault, V. Grass, et al., “Multi-level proteomics reveals host-perturbation strategies of sars-cov-2 and sars-cov,” Nature, vol. 594, pp. 246–252, 2021.
  33. W. Li and A. Godzik, “Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences,” Bioinformatics, vol. 22, pp. 1658–1659, 2006.
  34. L. Wei, W. He, A. Malik, et al., “Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework,” Brief Bioinform, 2020.
  35. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
  36. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  37. D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
  38. Y. Kim, C. Denton, L. Hoang, and A. M. Rush, “Structured attention networks,” in International Conference on Learning Representations, 2017.
  39. A. Parikh, O. Täckström, D. Das, and J. Uszkoreit, “A decomposable attention model,” in Empirical Methods in Natural Language Processing, 2016.
  40. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural computation, 1989.
  41. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012.
  42. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations, 2015.
  43. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
  44. M. Boudiaf, Z. I. Masud, J. Rony, J. Dolz, P. Piantanida, and I. B. Ayed, “Transductive information maximization for few-shot learning,” 2020.
  45. L. McInnes, J. Healy, and J. Melville, “Umap: Uniform manifold approximation and projection for dimension reduction,” arXiv preprint arXiv:1802.03426, 2018.
  46. E. Becht, L. McInnes, J. Healy, C.-A. Dutertre, I. W. Kwok, L. G. Ng, F. Ginhoux, and E. W. Newell, “Dimensionality reduction for visualizing single-cell data using umap,” Nature biotechnology, vol. 37, no. 1, pp. 38–44, 2019.
  47. R. Dai, W. Zhang, W. Tang, et al., “Bbppred: sequence-based prediction of blood-brain barrier peptides with feature representation learning and logistic regression,” J Chem Inf Model, vol. 61, pp. 525–34, 2021.
  48. P. Agrawal, D. Bhagat, M. Mahalwal, et al., “Anticp 2.0: an updated model for predicting anticancer peptides,” Brief Bioinform, vol. 22, p. bbaa153, 2021.
  49. S. Pinacho-Castellanos, C. García-Jacas, M. Gilson, et al., “Alignment-free antimicrobial peptide predictors: improving performance by a thorough analysis of the largest available data set,” J Chem Inf Model, vol. 61, pp. 3141–57, 2021.
  50. Z. Du, X. Ding, Y. Xu, and Y. Li, “Unidl4biopep: a universal deep learning architecture for binary classification in peptide bioactivity,” Briefings in Bioinformatics, vol. 24, no. 3, p. bbad135, 2023.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.