Papers
Topics
Authors
Recent
Search
2000 character limit reached

When SMILES have Language: Drug Classification using Text Classification Methods on Drug SMILES Strings

Published 3 Mar 2024 in q-bio.BM, cs.CL, cs.IR, cs.LG, and stat.ML | (2403.12984v2)

Abstract: Complex chemical structures, like drugs, are usually defined by SMILES strings as a sequence of molecules and bonds. These SMILES strings are used in different complex machine learning-based drug-related research and representation works. Escaping from complex representation, in this work, we pose a single question: What if we treat drug SMILES as conventional sentences and engage in text classification for drug classification? Our experiments affirm the possibility with very competitive scores. The study explores the notion of viewing each atom and bond as sentence components, employing basic NLP methods to categorize drug types, proving that complex problems can also be solved with simpler perspectives. The data and code are available here: https://github.com/azminewasi/Drug-Classification-NLP.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)
  1. When biology has chemistry: Solubility and drug subcategory prediction using SMILES strings, 2023. URL https://openreview.net/forum?id=28si4RXwDt1.
  2. One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome. Journal of Cheminformatics, 12(1), June 2020. ISSN 1758-2946. doi: 10.1186/s13321-020-00445-4. URL http://dx.doi.org/10.1186/s13321-020-00445-4.
  3. Artificial intelligence in drug discovery: applications and techniques. Briefings in Bioinformatics, 23(1):bbab430, 11 2021. ISSN 1477-4054. doi: 10.1093/bib/bbab430. URL https://doi.org/10.1093/bib/bbab430.
  4. Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, 4(2):268–276, 2018.
  5. Objective-reinforced generative adversarial networks (organ) for sequence generation models. arXiv preprint arXiv:1705.10843, 2017.
  6. Multi-class sentiment analysis of urdu text using multilingual bert. Scientific Reports, 12(1):5436, 2022.
  7. Application of smiles-based molecular generative model in new drug design. Frontiers in Pharmacology, 13, 2022. ISSN 1663-9812. doi: 10.3389/fphar.2022.1046524. URL https://www.frontiersin.org/articles/10.3389/fphar.2022.1046524.
  8. N-gram graph: Simple unsupervised representation for graphs, with applications to molecules. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems 32, pp.  8464–8476. Curran Associates, Inc., 2019. URL http://papers.nips.cc/paper/9054-n-gram-graph-simple-unsupervised-representation-for-graphs-with-applications-to-molecules.pdf.
  9. Learning drug functions from chemical structures with convolutional neural networks and random forests. Journal of Chemical Information and Modeling, 59(10):4438–4449, Oct 2019. ISSN 1549-9596. doi: 10.1021/acs.jcim.9b00236. URL https://doi.org/10.1021/acs.jcim.9b00236.
  10. The transformational role of gpu computing and deep learning in drug discovery. Nature Machine Intelligence, 4(3):211–221, 2022.
  11. Similarity maps - a visualization strategy for molecular fingerprints and machine-learning methods. Journal of Cheminformatics, 5(1), September 2013. ISSN 1758-2946. doi: 10.1186/1758-2946-5-43. URL http://dx.doi.org/10.1186/1758-2946-5-43.
  12. A smile is all you need: predicting limiting activity coefficients from smiles with natural language processing. Digital Discovery, 1(6):859–869, 2022.
  13. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res., 46(D1):D1074–D1082, January 2018.
  14. Learning to smiles: Ban-based strategies to improve latent representation learning from molecules. Briefings in Bioinformatics, 22(6):bbab327, 2021.
  15. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. Journal of medicinal chemistry, 63(16):8749–8760, 2019.
  16. Seqgan: Sequence generative adversarial nets with policy gradient. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 0 likes about this paper.