Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 153 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 29 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 76 tok/s Pro
Kimi K2 169 tok/s Pro
GPT OSS 120B 441 tok/s Pro
Claude Sonnet 4.5 39 tok/s Pro
2000 character limit reached

Machine Learning Approach for Cancer Entities Association and Classification (2306.00013v2)

Published 30 May 2023 in cs.CL and cs.LG

Abstract: According to the World Health Organization (WHO), cancer is the second leading cause of death globally. Scientific research on different types of cancers grows at an ever-increasing rate, publishing large volumes of research articles every year. The insight information and the knowledge of the drug, diagnostics, risk, symptoms, treatments, etc., related to genes are significant factors that help explore and advance the cancer research progression. Manual screening of such a large volume of articles is very laborious and time-consuming to formulate any hypothesis. The study uses the two most non-trivial NLP, Natural Language Processing functions, Entity Recognition, and text classification to discover knowledge from biomedical literature. Named Entity Recognition (NER) recognizes and extracts the predefined entities related to cancer from unstructured text with the support of a user-friendly interface and built-in dictionaries. Text classification helps to explore the insights into the text and simplifies data categorization, querying, and article screening. Machine learning classifiers are also used to build the classification model and Structured Query Languages (SQL) is used to identify the hidden relations that may lead to significant predictions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. “Cancer Statistics, 2020: Report From National Cancer Registry Programme, India” PMID: 32673076 In JCO Global Oncology, 2020, pp. 1063–1075 DOI: 10.1200/GO.20.00122
  2. “Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries” In CA Cancer J. Clin. 71.3 Wiley, 2021, pp. 209–249
  3. “Precision medicine for cancer with next-generation functional diagnostics” In Nature Reviews Cancer 15.12, 2015, pp. 747–756
  4. Lydia Shipman “The relevance of extensive editing in tumour transcriptomes” In Nature Reviews Cancer 15.12, 2015, pp. 698–698
  5. “Machine learning-based lung and colon cancer detection using deep feature extraction and ensemble learning” In Expert Systems with Applications 205 Elsevier, 2022, pp. 117695
  6. Prakash M Nadkarni, Lucila Ohno-Machado and Wendy W Chapman “Natural language processing: an introduction” In Journal of the American Medical Informatics Association 18.5, 2011, pp. 544–551 DOI: 10.1136/amiajnl-2011-000464
  7. “Natural language processing: state of the art, current trends and challenges” In Multimed. Tools Appl. 82.3 Springer ScienceBusiness Media LLC, 2023, pp. 3713–3744
  8. “Bag of Tricks for Efficient Text Classification”, 2016 arXiv:1607.01759 [cs.CL]
  9. “Biomedical Named-Entity Recognition by Hierarchically Fusing BioBERT Representations and Deep Contextual-Level Word-Embedding” In 2020 International Joint Conference on Neural Networks (IJCNN), 2020, pp. 1–8 DOI: 10.1109/IJCNN48605.2020.9206808
  10. Charu C. Aggarwal and ChengXiang Zhai “A Survey of Text Classification Algorithms” In Mining Text Data Boston, MA: Springer US, 2012, pp. 163–222 DOI: 10.1007/978-1-4614-3223-4_6
  11. Jeyakodi Gopal, Vigneshwar Suriya Prakash Sinnarasan and Amouda Venkatesan “Identification of Repurpose Drugs by Computational Analysis of Disease–Gene–Drug Associations” PMID: 34242526 In Journal of Computational Biology 28.10, 2021, pp. 975–984 DOI: 10.1089/cmb.2020.0356
  12. “Comparison of named entity recognition methodologies in biomedical documents” In BioMedical Engineering OnLine 17.2, 2018, pp. 158
  13. “Named Entity Recognition Over Electronic Health Records Through a Combined Dictionary-based Approach” International Conference on ENTERprise Information Systems/International Conference on Project MANagement/International Conference on Health and Social Care Information Systems and Technologies, CENTERIS/ProjMAN / HCist 2016 In Procedia Computer Science 100, 2016, pp. 55–61 DOI: https://doi.org/10.1016/j.procs.2016.09.123
  14. “Attention is All You Need” In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17 Long Beach, California, USA: Curran Associates Inc., 2017, pp. 6000–6010
  15. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, 2019 arXiv:1810.04805 [cs.CL]
  16. Geoffrey Holmes, Andrew Donkin and Ian H Witten “WEKA: a machine learning workbench”, Computer Science Working Papers, 1994
  17. “SparkText: Biomedical Text Mining on Big Data Framework” In PLoS One 11.9, 2016, pp. e0162721
  18. Yuri Demchenko, Cees Laat and Peter Membrey “Defining architecture components of the Big Data Ecosystem”, 2014, pp. 104–112 DOI: 10.1109/CTS.2014.6867550
  19. Aaron Ceglar and John F. Roddick “Association Mining” In ACM Comput. Surv. 38.2 New York, NY, USA: Association for Computing Machinery, 2006, pp. 5–es DOI: 10.1145/1132956.1132958
  20. Mohammed J. Zaki “Scalable Algorithms for Association Mining” In IEEE Trans. on Knowl. and Data Eng. 12.3 USA: IEEE Educational Activities Department, 2000, pp. 372–390 DOI: 10.1109/69.846291
  21. “Cancer.Net” In J. Oncol. Pract. 4.4, 2008, pp. 188
  22. John R. Pawloski, Brion Randolph and Paulie Bajic “Foundation-OneR-Heme Next Generation Sequencing of Hairy Cell Leukemia Variant Lymphocytes” In Blood 128.22, 2016, pp. 5289–5289 DOI: 10.1182/blood.V128.22.5289.5289
  23. Raffaele Perego, Salvatore Orlando and P. Palmerini “Enhancing the Apriori Algorithm for Frequent Set Counting” In Data Warehousing and Knowledge Discovery Berlin, Heidelberg: Springer Berlin Heidelberg, 2001, pp. 71–82
  24. Roberto J. Bayardo “Efficiently Mining Long Patterns from Databases” In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, SIGMOD ’98 Seattle, Washington, USA: Association for Computing Machinery, 1998, pp. 85–93 DOI: 10.1145/276304.276313
  25. “The Optimization and Improvement of the Apriori Algorithm” In Proceedings of the 2008 International Symposium on Intelligent Information Technology Application Workshops, IITAW ’08 USA: IEEE Computer Society, 2008, pp. 1101–1103 DOI: 10.1109/IITA.Workshops.2008.170
  26. Pang-Ning Tan, Vipin Kumar and Jaideep Srivastava “Selecting the Right Interestingness Measure for Association Patterns” In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02 Edmonton, Alberta, Canada: Association for Computing Machinery, 2002, pp. 32–41 DOI: 10.1145/775047.775053
  27. “Corpus-Based Stemming Using Cooccurrence of Word Variants” In ACM Trans. Inf. Syst. 16.1 New York, NY, USA: Association for Computing Machinery, 1998, pp. 61–81 DOI: 10.1145/267954.267957
  28. Jon A Willits, Mark S Seidenberg and Jenny R Saffran “Distributional structure in language: contributions to noun-verb difficulty differences in infant word recognition” In Cognition 132.3, 2014, pp. 429–436
  29. “Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents” In International Journal of Computer Applications 181, 2018 DOI: 10.5120/ijca2018917395
  30. “Supervised machine learning algorithms: Classification and comparison” In International Journal of Computer Trends and Technology 48.3, 2017, pp. 128–138 DOI: 10.14445/22312803/ijctt-v48p126
  31. Laila Khreisat “A machine learning approach for Arabic text classification using N-gram frequency statistics” In Journal of Informetrics 3.1, 2009, pp. 72–77 DOI: https://doi.org/10.1016/j.joi.2008.11.005
  32. M. Junker, R. Hoch and A. Dengel “On the evaluation of document analysis components by recall, precision, and accuracy” In Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR ’99 (Cat. No.PR00318), 1999, pp. 713–716 DOI: 10.1109/ICDAR.1999.791887
  33. “Organizing and computing metabolic pathway data in terms of binary relations” In Pac. Symp. Biocomput., 1997, pp. 175–186

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.