Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 51 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 90 tok/s Pro
Kimi K2 205 tok/s Pro
GPT OSS 120B 440 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Predicting ATP binding sites in protein sequences using Deep Learning and Natural Language Processing (2402.01829v1)

Published 2 Feb 2024 in q-bio.BM, cs.CL, and cs.LG

Abstract: Predicting ATP-Protein Binding sites in genes is of great significance in the field of Biology and Medicine. The majority of research in this field has been conducted through time- and resource-intensive 'wet experiments' in laboratories. Over the years, researchers have been investigating computational methods computational methods to accomplish the same goals, utilising the strength of advanced Deep Learning and NLP algorithms. In this paper, we propose to develop methods to classify ATP-Protein binding sites. We conducted various experiments mainly using PSSMs and several word embeddings as features. We used 2D CNNs and LightGBM classifiers as our chief Deep Learning Algorithms. The MP3Vec and BERT models have also been subjected to testing in our study. The outcomes of our experiments demonstrated improvement over the state-of-the-art benchmarks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Molecular Biology of the Cell. New York: Garland Science, 4 edition. Protein Function. Available from: https://www.ncbi.nlm.nih.gov/books/NBK26911/.
  2. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research, 25(17): 3389–3402.
  3. VISCANA: visualized cluster analysis of protein-ligand interaction based on the ab initio fragment molecular orbital method for virtual ligand screening. J Chem Inf Model, 46(1): 221–230.
  4. The SWISS-PROT Protein Sequence Data Bank and Its New Supplement TREMBL. Nucleic Acids Research, 24(1): 21–25.
  5. SuperSite: dictionary of metabolite and drug binding sites in proteins. Nucleic acids research, 37(suppl_1): D195–D200.
  6. The Protein Data Bank. Nucleic Acids Research, 28(1): 235–242.
  7. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5: 135–146.
  8. High-Resolution Protein Structure Determination by Serial Femtosecond Crystallography. Science, 337(6092): 362–364.
  9. NMR-based analysis of protein–ligand interactions. Analytical and bioanalytical chemistry, 406(4): 943–956.
  10. Protein structure determination from NMR chemical shifts. Proceedings of the National Academy of Sciences, 104(23): 9615–9620.
  11. Identification of ATP binding residues of a protein from its primary sequence. BMC bioinformatics, 10(1): 1–9.
  12. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16: 321–357.
  13. ATPsite: sequence-based prediction of ATP-binding residues. In Proteome Science, volume 9, 1–8. BioMed Central.
  14. Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors. Bioinformatics, 28(3): 331–341.
  15. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805.
  16. ATP requirement for the processes of DNA replication in isolated HeLa cell nuclei. The Journal of Biochemistry, 89(3): 801–807.
  17. MP3vec: A Reusable Machine-Constructed Feature Representation for Protein Sequences. In 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 421–425.
  18. Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences. Biology, 9(10): 325.
  19. KNN-based dynamic query-driven sample rescaling strategy for class imbalance learning. Neurocomputing, 191: 363–373.
  20. ATPbind: accurate protein–ATP binding site prediction by combining sequence-profiling and structure-based comparisons. Journal of chemical information and modeling, 58(2): 501–510.
  21. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, 3149–3157. Red Hook, NY, USA: Curran Associates Inc. ISBN 9781510860964.
  22. Prediction of ATP-binding sites in membrane proteins using a two-dimensional convolutional neural network. Journal of Molecular Graphics and Modelling, 92: 86–93.
  23. A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification. International Journal of Molecular Sciences, 21(23): 9070.
  24. Identifying SNAREs by incorporating deep learning architecture and amino acid embedding representation. Frontiers in physiology, 10: 1501.
  25. Classifying promoters by interpreting the hidden information of DNA sequences via deep learning and combination of continuous fasttext N-grams. Frontiers in bioengineering and biotechnology, 7: 305.
  26. The ATP-binding site of type II topoisomerases as a target for antibacterial drugs. Curr Top Med Chem, 3(3): 283–303.
  27. The PSIPRED protein structure prediction server. Bioinformatics, 16(4): 404–405.
  28. Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781.
  29. The stress response protein REDD1 promotes diabetes-induced oxidative stress in the retina by Keap1-independent Nrf2 degradation. J Biol Chem, 295(21): 7350–7361.
  30. Deep learning in bioinformatics. Briefings in bioinformatics, 18(5): 851–869.
  31. On the evolution of protein–adenine binding. Proceedings of the National Academy of Sciences, 117(9): 4701–4709.
  32. Novak, I. 2003. ATP as a signaling molecule: the exocrine focus. Physiology, 18(1): 12–17.
  33. The molecular mechanism of transport by the mitochondrial ADP/ATP carrier. Cell, 176(3): 435–447.
  34. Understanding and predicting druggability. A high-throughput method for detection of drug binding sites. J Med Chem, 53(15): 5858–5867.
  35. Halogen interactions in protein-ligand complexes: implications of halogen bonding for rational drug design. J Chem Inf Model, 53(11): 2781–2791.
  36. Automated analysis of interatomic contacts in proteins. Bioinformatics (Oxford, England), 15(4): 327–332.
  37. Prediction of protein–ATP binding residues based on ensemble of deep convolutional neural networks and lightGBM algorithm. International Journal of Molecular Sciences, 22(2): 939.
  38. A novel sequence-based prediction method for ATP-binding sites using fusion of SMOTE algorithm and random forests classifier. Biotechnology & Biotechnological Equipment, 34(1): 1336–1346.
  39. Multiple Antigenic Peptide System Coupled with Amyloid Beta Protein Epitopes As An Immunization Approach to Treat Alzheimer’s Disease. ACS Chem Neurosci, 10(6): 2794–2800.
  40. GraphProt2: A graph neural network-based method for predicting binding sites of RNA-binding proteins. bioRxiv, 850024.
  41. Large-scale prediction of binding affinity in protein–small ligand complexes: the PRODIGY-LIG web server. Bioinformatics, 35(9): 1585–1587.
  42. Virtual screening using protein-ligand docking: avoiding artificial enrichment. J Chem Inf Comput Sci, 44(3): 793–806.
  43. Interplay between Conformational Entropy and Solvation Entropy in Protein–Ligand Binding. Journal of the American Chemical Society, 141(5): 2012–2026.
  44. Deep learning for computer vision: A brief review. Computational intelligence and neuroscience, 2018.
  45. A comparison of word embeddings for the biomedical natural language processing. Journal of biomedical informatics, 87: 12–20.
  46. PTPD: predicting therapeutic peptides by deep learning and word2vec. BMC bioinformatics, 20(1): 1–8.
  47. Multitask deep networks with grid featurization achieve improved scoring performance for protein–ligand binding. Chemical Biology & Drug Design, 96(3): 973–983.
  48. Recent trends in deep learning based natural language processing. ieee Computational intelligenCe magazine, 13(3): 55–75.
  49. TargetATPsite: A template-free method for ATP-binding sites prediction with residue evolution image sparse representation and classifier ensemble. Journal of computational chemistry, 34(11): 974–985.
  50. Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling. Neurocomputing, 104: 180–190.
  51. Circulating 25-hydroxyvitamin D, vitamin D binding protein and risk of advanced and lethal prostate cancer. Int J Cancer, 144(10): 2401–2407.
  52. Big Bird: Transformers for Longer Sequences. In Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.; and Lin, H., eds., Advances in Neural Information Processing Systems, volume 33, 17283–17297. Curran Associates, Inc.
  53. Predicting protein-ATP binding sites from primary sequence through fusing bi-profile sampling of multi-view features. Bmc Bioinformatics, 13(1): 1–11.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 2 likes.