Papers
Topics
Authors
Recent
2000 character limit reached

Improving PTM Site Prediction by Coupling of Multi-Granularity Structure and Multi-Scale Sequence Representation (2401.10211v1)

Published 4 Jan 2024 in q-bio.QM, cs.AI, and cs.LG

Abstract: Protein post-translational modification (PTM) site prediction is a fundamental task in bioinformatics. Several computational methods have been developed to predict PTM sites. However, existing methods ignore the structure information and merely utilize protein sequences. Furthermore, designing a more fine-grained structure representation learning method is urgently needed as PTM is a biological event that occurs at the atom granularity. In this paper, we propose a PTM site prediction method by Coupling of Multi-Granularity structure and Multi-Scale sequence representation, PTM-CMGMS for brevity. Specifically, multigranularity structure-aware representation learning is designed to learn neighborhood structure representations at the amino acid, atom, and whole protein granularity from AlphaFold predicted structures, followed by utilizing contrastive learning to optimize the structure representations.Additionally, multi-scale sequence representation learning is used to extract context sequence information, and motif generated by aligning all context sequences of PTM sites assists the prediction. Extensive experiments on three datasets show that PTM-CMGMS outperforms the state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. The structural context of posttranslational modifications at a proteome-wide scale. PLoS biology, 20(5): e3001636.
  2. A chemical probe for protein crotonylation. Journal of the American Chemical Society, 140(14): 4757–4760.
  3. Modulating lysine crotonylation in cardiomyocytes improves myocardial outcomes. Circulation Research, 131(5): 456–472.
  4. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  5. Prottrans: Toward understanding the language of life through self-supervised learning. IEEE transactions on pattern analysis and machine intelligence, 44(10): 7112–7127.
  6. Histone crotonylation promotes mesoendodermal commitment of human embryonic stem cells. Cell Stem Cell, 28(4): 748–763.
  7. S-nitrosylation: an emerging paradigm of redox signaling. Antioxidants, 8(9): 404.
  8. BioJava-ModFinder: identification of protein modifications in 3D structures from the Protein Data Bank. Bioinformatics, 33(13): 2047–2049.
  9. GPSuc: Global Prediction of Generic and Species-specific Succinylation Sites by aggregating multiple sequence features. PloS one, 13(10): e0200283.
  10. Prediction of S-nitrosylation sites by integrating support vector machines and random forest. Molecular omics, 15(6): 451–458.
  11. SuccinSite: a computational tool for the prediction of protein succinylation sites by exploiting the amino acid patterns and properties. Molecular BioSystems, 12(3): 786–795.
  12. Learning embedding features based on multisense-scaled attention architecture to improve the predictive performance of anticancer peptides. Bioinformatics, 37(24): 4684–4693.
  13. Long short-term memory. Neural computation, 9(8): 1735–1780.
  14. iSuc-PseOpt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. Analytical biochemistry, 497: 48–56.
  15. pSuc-Lys: predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. Journal of theoretical biology, 394: 223–230.
  16. Prediction of lysine crotonylation sites by incorporating the composition of k-spaced amino acid pairs into Chou’s general PseAAC. Journal of Molecular Graphics and Modelling, 77: 200–204.
  17. Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873): 583–589.
  18. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers: Original Research on Biomolecules, 22(12): 2577–2637.
  19. SNOSite: exploiting maximal dependence decomposition to identify cysteine S-nitrosylation with substrate site specificity. PloS one, 6(7): e21849.
  20. PRISMOID: a comprehensive 3D structure database for post-translational modifications and mutations with functional impact. Briefings in bioinformatics, 21(3): 1069–1079.
  21. Adapt-Kcr: a novel deep learning framework for accurate prediction of lysine crotonylation sites based on learning embedding features and attention architecture. Briefings in Bioinformatics, 23(2): bbac037.
  22. Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv, 2022: 500902.
  23. Prediction of protein crotonylation sites through LightGBM classifier based on SMOTE and elastic net. Analytical biochemistry, 609: 113903.
  24. Deep-Kcr: accurate detection of lysine crotonylation sites using deep learning method. Briefings in bioinformatics, 22(4): bbaa255.
  25. DLF-Sul: a multi-module deep learning framework for prediction of S-sulfinylation sites in proteins. Briefings in Bioinformatics, 23(5): bbac323.
  26. Detecting succinylation sites from protein sequences using ensemble support vector machine. BMC bioinformatics, 19(1): 1–9.
  27. Improving protein succinylation sites prediction using embeddings from protein language model. Scientific Reports, 12(1): 16933.
  28. pLMSNOSite: an ensemble-based approach for predicting protein S-nitrosylation sites by integrating supervised word embedding and embedding from pre-trained protein language model. BMC bioinformatics, 24(1): 41.
  29. BERT-Kcr: prediction of lysine crotonylation sites by a transfer learning method with pre-trained BERT models. Bioinformatics, 38(3): 648–654.
  30. Identify and analysis crotonylation sites in histone by using support vector machines. Artificial intelligence in medicine, 83: 75–81.
  31. DeepSuccinylSite: a deep learning based approach for protein succinylation site prediction. BMC bioinformatics, 21(3): 1–10.
  32. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic acids research, 50(D1): D439–D444.
  33. Attention is all you need. Advances in neural information processing systems, 30.
  34. Class I histone deacetylases are major histone decrotonylases: evidence for critical and broad function of histone crotonylation in transcription. Cell research, 27(7): 898–915.
  35. DeepNitro: prediction of protein nitration and nitrosylation sites by deep learning. Genomics, proteomics & bioinformatics, 16(4): 294–306.
  36. iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PloS one, 8(2): e55844.
  37. iSuc-PseAAC: predicting lysine succinylation in proteins by incorporating peptide position-specific propensity. Scientific reports, 5(1): 10184.
  38. GPS-SNO: computational prediction of protein S-nitrosylation sites with a modified GPS algorithm. PloS one, 5(6): e11290.
  39. Succinylation links metabolism to protein functions. Neurochemical research, 44: 2346–2359.
  40. Global crotonylome reveals CDYL-regulated RPA1 crotonylation in homologous recombination–mediated DNA repair. Science advances, 6(11): eaay4697.
  41. AlphaFold2-aware protein–DNA binding site prediction using graph transformer. Briefings in Bioinformatics, 23(2): bbab564.
  42. Identifying B-cell epitopes using AlphaFold2 predicted structures and pretrained language model. Bioinformatics, 39(4): btad187.
  43. SiameseCPP: a sequence-based Siamese network to predict cell-penetrating peptides by contrastive learning. Briefings in Bioinformatics, 24(1): bbac545.
  44. Protein representation learning by geometric structure pretraining. arXiv preprint arXiv:2203.06125.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.