Machine Learning Approach for Cancer Entities Association and Classification (2306.00013v2)
Abstract: According to the World Health Organization (WHO), cancer is the second leading cause of death globally. Scientific research on different types of cancers grows at an ever-increasing rate, publishing large volumes of research articles every year. The insight information and the knowledge of the drug, diagnostics, risk, symptoms, treatments, etc., related to genes are significant factors that help explore and advance the cancer research progression. Manual screening of such a large volume of articles is very laborious and time-consuming to formulate any hypothesis. The study uses the two most non-trivial NLP, Natural Language Processing functions, Entity Recognition, and text classification to discover knowledge from biomedical literature. Named Entity Recognition (NER) recognizes and extracts the predefined entities related to cancer from unstructured text with the support of a user-friendly interface and built-in dictionaries. Text classification helps to explore the insights into the text and simplifies data categorization, querying, and article screening. Machine learning classifiers are also used to build the classification model and Structured Query Languages (SQL) is used to identify the hidden relations that may lead to significant predictions.
- “Cancer Statistics, 2020: Report From National Cancer Registry Programme, India” PMID: 32673076 In JCO Global Oncology, 2020, pp. 1063–1075 DOI: 10.1200/GO.20.00122
- “Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries” In CA Cancer J. Clin. 71.3 Wiley, 2021, pp. 209–249
- “Precision medicine for cancer with next-generation functional diagnostics” In Nature Reviews Cancer 15.12, 2015, pp. 747–756
- Lydia Shipman “The relevance of extensive editing in tumour transcriptomes” In Nature Reviews Cancer 15.12, 2015, pp. 698–698
- “Machine learning-based lung and colon cancer detection using deep feature extraction and ensemble learning” In Expert Systems with Applications 205 Elsevier, 2022, pp. 117695
- Prakash M Nadkarni, Lucila Ohno-Machado and Wendy W Chapman “Natural language processing: an introduction” In Journal of the American Medical Informatics Association 18.5, 2011, pp. 544–551 DOI: 10.1136/amiajnl-2011-000464
- “Natural language processing: state of the art, current trends and challenges” In Multimed. Tools Appl. 82.3 Springer ScienceBusiness Media LLC, 2023, pp. 3713–3744
- “Bag of Tricks for Efficient Text Classification”, 2016 arXiv:1607.01759 [cs.CL]
- “Biomedical Named-Entity Recognition by Hierarchically Fusing BioBERT Representations and Deep Contextual-Level Word-Embedding” In 2020 International Joint Conference on Neural Networks (IJCNN), 2020, pp. 1–8 DOI: 10.1109/IJCNN48605.2020.9206808
- Charu C. Aggarwal and ChengXiang Zhai “A Survey of Text Classification Algorithms” In Mining Text Data Boston, MA: Springer US, 2012, pp. 163–222 DOI: 10.1007/978-1-4614-3223-4_6
- Jeyakodi Gopal, Vigneshwar Suriya Prakash Sinnarasan and Amouda Venkatesan “Identification of Repurpose Drugs by Computational Analysis of Disease–Gene–Drug Associations” PMID: 34242526 In Journal of Computational Biology 28.10, 2021, pp. 975–984 DOI: 10.1089/cmb.2020.0356
- “Comparison of named entity recognition methodologies in biomedical documents” In BioMedical Engineering OnLine 17.2, 2018, pp. 158
- “Named Entity Recognition Over Electronic Health Records Through a Combined Dictionary-based Approach” International Conference on ENTERprise Information Systems/International Conference on Project MANagement/International Conference on Health and Social Care Information Systems and Technologies, CENTERIS/ProjMAN / HCist 2016 In Procedia Computer Science 100, 2016, pp. 55–61 DOI: https://doi.org/10.1016/j.procs.2016.09.123
- “Attention is All You Need” In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17 Long Beach, California, USA: Curran Associates Inc., 2017, pp. 6000–6010
- “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, 2019 arXiv:1810.04805 [cs.CL]
- Geoffrey Holmes, Andrew Donkin and Ian H Witten “WEKA: a machine learning workbench”, Computer Science Working Papers, 1994
- “SparkText: Biomedical Text Mining on Big Data Framework” In PLoS One 11.9, 2016, pp. e0162721
- Yuri Demchenko, Cees Laat and Peter Membrey “Defining architecture components of the Big Data Ecosystem”, 2014, pp. 104–112 DOI: 10.1109/CTS.2014.6867550
- Aaron Ceglar and John F. Roddick “Association Mining” In ACM Comput. Surv. 38.2 New York, NY, USA: Association for Computing Machinery, 2006, pp. 5–es DOI: 10.1145/1132956.1132958
- Mohammed J. Zaki “Scalable Algorithms for Association Mining” In IEEE Trans. on Knowl. and Data Eng. 12.3 USA: IEEE Educational Activities Department, 2000, pp. 372–390 DOI: 10.1109/69.846291
- “Cancer.Net” In J. Oncol. Pract. 4.4, 2008, pp. 188
- John R. Pawloski, Brion Randolph and Paulie Bajic “Foundation-OneR-Heme Next Generation Sequencing of Hairy Cell Leukemia Variant Lymphocytes” In Blood 128.22, 2016, pp. 5289–5289 DOI: 10.1182/blood.V128.22.5289.5289
- Raffaele Perego, Salvatore Orlando and P. Palmerini “Enhancing the Apriori Algorithm for Frequent Set Counting” In Data Warehousing and Knowledge Discovery Berlin, Heidelberg: Springer Berlin Heidelberg, 2001, pp. 71–82
- Roberto J. Bayardo “Efficiently Mining Long Patterns from Databases” In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, SIGMOD ’98 Seattle, Washington, USA: Association for Computing Machinery, 1998, pp. 85–93 DOI: 10.1145/276304.276313
- “The Optimization and Improvement of the Apriori Algorithm” In Proceedings of the 2008 International Symposium on Intelligent Information Technology Application Workshops, IITAW ’08 USA: IEEE Computer Society, 2008, pp. 1101–1103 DOI: 10.1109/IITA.Workshops.2008.170
- Pang-Ning Tan, Vipin Kumar and Jaideep Srivastava “Selecting the Right Interestingness Measure for Association Patterns” In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02 Edmonton, Alberta, Canada: Association for Computing Machinery, 2002, pp. 32–41 DOI: 10.1145/775047.775053
- “Corpus-Based Stemming Using Cooccurrence of Word Variants” In ACM Trans. Inf. Syst. 16.1 New York, NY, USA: Association for Computing Machinery, 1998, pp. 61–81 DOI: 10.1145/267954.267957
- Jon A Willits, Mark S Seidenberg and Jenny R Saffran “Distributional structure in language: contributions to noun-verb difficulty differences in infant word recognition” In Cognition 132.3, 2014, pp. 429–436
- “Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents” In International Journal of Computer Applications 181, 2018 DOI: 10.5120/ijca2018917395
- “Supervised machine learning algorithms: Classification and comparison” In International Journal of Computer Trends and Technology 48.3, 2017, pp. 128–138 DOI: 10.14445/22312803/ijctt-v48p126
- Laila Khreisat “A machine learning approach for Arabic text classification using N-gram frequency statistics” In Journal of Informetrics 3.1, 2009, pp. 72–77 DOI: https://doi.org/10.1016/j.joi.2008.11.005
- M. Junker, R. Hoch and A. Dengel “On the evaluation of document analysis components by recall, precision, and accuracy” In Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR ’99 (Cat. No.PR00318), 1999, pp. 713–716 DOI: 10.1109/ICDAR.1999.791887
- “Organizing and computing metabolic pathway data in terms of binary relations” In Pac. Symp. Biocomput., 1997, pp. 175–186
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.