Advancements in eHealth Data Analytics through Natural Language Processing and Deep Learning (2401.10850v1)
Abstract: The healthcare environment is commonly referred to as "information-rich" but also "knowledge poor". Healthcare systems collect huge amounts of data from various sources: lab reports, medical letters, logs of medical tools or programs, medical prescriptions, etc. These massive sets of data can provide great knowledge and information that can improve the medical services, and overall the healthcare domain, such as disease prediction by analyzing the patient's symptoms or disease prevention, by facilitating the discovery of behavioral factors for diseases. Unfortunately, only a relatively small volume of the textual eHealth data is processed and interpreted, an important factor being the difficulty in efficiently performing Big Data operations. In the medical field, detecting domain-specific multi-word terms is a crucial task as they can define an entire concept with a few words. A term can be defined as a linguistic structure or a concept, and it is composed of one or more words with a specific meaning to a domain. All the terms of a domain create its terminology. This chapter offers a critical study of the current, most performant solutions for analyzing unstructured (image and textual) eHealth data. This study also provides a comparison of the current Natural Language Processing and Deep Learning techniques in the eHealth context. Finally, we examine and discuss some of the current issues, and we define a set of research directions in this area.
- Publicly available clinical BERT embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, pages 72–78, Minneapolis, Minnesota, USA, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/W19-1909. URL https://aclanthology.org/W19-1909.
- Detection of abnormal behaviour for dementia sufferers using convolutional neural networks. Artificial intelligence in medicine, 94:88–95, 2019.
- Incorporated region detection and classification using deep convolutional networks for bone age assessment. Artificial intelligence in medicine, 97:1–8, 2019.
- Using word embeddings for unsupervised acronym disambiguation. In International Conference on Computational Linguistics, pages 2610–2619. Association for Computational Linguistics, 2018.
- Support-vector networks. Machine learning, 20(3):273–297, 1995.
- BERT: pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics, 2019. doi: 10.18653/v1/n19-1423.
- Improving terminology mapping in clinical text with context-sensitive spelling correction. Informatics for Health: Connected Citizen-Led Wellness and Population Health, 235:241–245, 2017.
- Unsupervised context-sensitive spelling correction of clinical free-text with word and character n-gram embeddings. In BioNLP 2017. Association for Computational Linguistics, 2017. doi: 10.18653/v1/W17-2317.
- Gan-based synthetic medical image augmentation for increased cnn performance in liver lesion classification. Neurocomputing, 321:321–331, 2018.
- Deep learning applications to cytopathology: A study on the detection of malaria and on the classification of leukaemia cell-lines. In Handbook of Deep Learning Applications, pages 219–257. Springer, 2019.
- Recurrent convolutional neural network based multimodal disease risk prediction. Future Generation Computer Systems, 92:76–83, mar 2019. doi: 10.1016/j.future.2018.09.031.
- Classifying medical relations in clinical text via convolutional neural networks. Artificial intelligence in medicine, 93:43–49, 2019.
- sraki-rnn: accelerated mri with scan-specific recurrent neural networks using densely connected blocks. In Wavelets and Sparsity XVIII, volume 11138, page 111381B. International Society for Optics and Photonics, 2019.
- Clinicalbert: Modeling clinical notes and predicting hospital readmission. CoRR, abs/1904.05342, 2019. URL http://arxiv.org/abs/1904.05342.
- Identification and correction of misspelled drugs names in electronic medical records (emr). In International Conference on Enterprise Information Systems, pages 333–338, 2016.
- Pandia Rajan Jeyaraj and Edward Rajan Samuel Nadar. Deep boltzmann machine algorithm for accurate medical image analysis for classification of cancerous region. Cognitive Computation and Systems, 1(3):85–90, sep 2019. doi: 10.1049/ccs.2019.0004.
- Fully-connected lstm–crf on medical concept extraction. International Journal of Machine Learning and Cybernetics, pages 1–9, 2020.
- Fasttext. zip: Compressing text classification models. arXiv preprint arXiv:1612.03651, 2016.
- Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
- Automated misspelling detection and correction in clinical free-text records. Journal of Biomedical Informatics, 55:188–195, jun 2015. doi: 10.1016/j.jbi.2015.04.008.
- Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 2020.
- Acronym disambiguation using word embedding. In AAAI Conference on Artificial Intelligence, pages 4178–4179, 2015.
- Improved deep belief network model and its application in named entity recognition of chinese electronic medical records. In International Conference on Big Data Analysis. IEEE, mar 2018. doi: 10.1109/ICBDA.2018.8367707.
- Combining c-value and keyword extraction methods for biomedical terms extraction. In International Symposium on Languages in Biology and Medicine, pages 45–49, 2013.
- Big data and health analytics. Crc Press, 2014.
- Efficient estimation of word representations in vector space. In Yoshua Bengio and Yann LeCun, editors, 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, 2013.
- Barrett’s esophagus analysis using infinity restricted boltzmann machines. Journal of Visual Communication and Image Representation, 59:475–485, feb 2019. doi: 10.1016/j.jvcir.2019.01.043.
- Adapting pre-trained word embeddings for use in medical coding. In BioNLP 2017, pages 302–306. Association for Computational Linguistics, 2017. doi: 10.18653/v1/W17-2338.
- Towards qualitative word embeddings evaluation: Measuring neighbors variation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop. Association for Computational Linguistics, 2018. doi: 10.18653/v1/N18-4005.
- Misspelling oblivious word embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3226–3234, 2019. doi: 10.18653/v1/N19-1326.
- Mimicking word embeddings using subword RNNs. In Conference on Empirical Methods in Natural Language Processing, pages 102–112. Association for Computational Linguistics, 2017. doi: 10.18653/v1/D17-1010.
- An intelligent recurrent neural network with long short-term memory (lstm) based batch normalization for medical image denoising. Journal of medical systems, 43(8):234, 2019.
- A Hybrid Bi-LSTM-CRF model for knowledge recognition from ehealth documents. CEUR Workshop Proceedings, 2172:65–70, 2018. ISSN 16130073.
- Cnn-lstm: Cascaded framework for brain tumour classification. In 2018 IEEE-EMBS Conference on Biomedical Engineering and Sciences (IECBES), pages 633–637. IEEE, 2018.
- Medical word embeddings for spanish: Development and evaluation. In Clinical Natural Language Processing Workshop, pages 124–133, 2019.
- A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4):427–437, jul 2009. doi: 10.1016/j.ipm.2009.03.002.
- Irena Spasic. Acronyms as an integral part of multi-word term recognition – a token of appreciation. IEEE Access, 6:8351–8363, 2018. doi: 10.1109/ACCESS.2018.2807122.
- sense2vec - A fast and accurate method for word sense disambiguation in neural word embeddings. CoRR, abs/1511.06388, 2015. URL http://arxiv.org/abs/1511.06388.
- Ontology-based deep restricted boltzmann machine. In International Conference on Database and Expert Systems Applications, pages 431–445. Springer International Publishing, 2016. doi: 10.1007/978-3-319-44403-1_27.
- Clinical abbreviation disambiguation using neural word embeddings. In Proceedings of BioNLP 15. Association for Computational Linguistics, 2015. doi: 10.18653/v1/W15-3822.
- Identifying relations of medications with adverse drug events using recurrent convolutional neural networks and gradient boosting. Journal of the American Medical Informatics Association, 27(1):65–72, 2020.
- Automated misspelling detection and correction in persian clinical text. Journal of Digital Imaging, pages 1–8, 2019. doi: 10.1007/s10278-019-00296-y.
- Comparing high dimensional word embeddings trained on medical text to bag-of-words for predicting medical codes. In Intelligent Information and Database Systems, pages 97–108. Springer International Publishing, 2020. doi: 10.1007/978-3-030-41964-6_9.
- Multi-label learning from medical plain text with convolutional residual models. In Machine Learning for Healthcare Conference, volume 85 of Machine Learning Research, pages 280–294, 2018.
- A comparative evaluation of term recognition algorithms. In LREC, volume 5, 2008.