2000 character limit reached
High Throughput Phenotyping of Physician Notes with Large Language and Hybrid NLP Models (2403.05920v1)
Published 9 Mar 2024 in cs.CL and cs.AI
Abstract: Deep phenotyping is the detailed description of patient signs and symptoms using concepts from an ontology. The deep phenotyping of the numerous physician notes in electronic health records requires high throughput methods. Over the past thirty years, progress toward making high throughput phenotyping feasible. In this study, we demonstrate that a LLM and a hybrid NLP model (combining word vectors with a machine learning classifier) can perform high throughput phenotyping on physician notes with high accuracy. LLMs will likely emerge as the preferred method for high throughput deep phenotyping of physician notes.
- M. Sahu, R. Gupta, R. K. Ambasta, and P. Kumar, “Artificial intelligence and machine learning in precision medicine: A paradigm shift in big data analysis,” Progress in Molecular Biology and Translational Science, vol. 190, no. 1, pp. 57–100, 2022.
- M. Afzal, S. R. Islam, M. Hussain, and S. Lee, “Precision medicine informatics: principles, prospects, and challenges,” IEEE Access, vol. 8, pp. 13 593–13 612, 2020.
- P. N. Robinson, “Deep phenotyping for precision medicine,” Human mutation, vol. 33, no. 5, pp. 777–780, 2012.
- D. Hier, R. Yelugam, S. Azizi, and D. Wunsch III, “A focused review of deep phenotyping with examples from neurology,” Eur Sci J, vol. 18, pp. 4–19, 2022.
- D. Hier, R. Yelugam, S. Azizi, M. Carrithers, and I. Wunsch, “Dc. high throughput neurological phenotyping with metamap,” Eur Sci J, vol. 18, pp. 37–49, 2022.
- R. R. Mir, M. Reynolds, F. Pinto, M. A. Khan, and M. A. Bhat, “High-throughput phenotyping for crop improvement in the genomics era,” Plant Science, vol. 282, pp. 60–72, 2019.
- M. A. Gehan and E. A. Kellogg, “High-throughput phenotyping,” American journal of botany, vol. 104, no. 4, pp. 505–508, 2017.
- H. Alzoubi, R. Alzubi, N. Ramzan, D. West, T. Al-Hadhrami, and M. Alazab, “A review of automatic phenotyping approaches using electronic health records,” Electronics, vol. 8, no. 11, p. 1235, 2019.
- C. Shivade, P. Raghavan, E. Fosler-Lussier, P. J. Embi, N. Elhadad, S. B. Johnson, and A. M. Lai, “A review of approaches to identifying patient phenotype cohorts using electronic health records,” Journal of the American Medical Informatics Association, vol. 21, no. 2, pp. 221–230, 2014.
- J. Pathak, A. N. Kho, and J. C. Denny, “Electronic health records-driven phenotyping: challenges, recent advances, and perspectives,” Journal of the American Medical Informatics Association, vol. 20, no. e2, pp. e206–e211, 2013.
- M. Krauthammer and G. Nenadic, “Term identification in the biomedical literature,” Journal of biomedical informatics, vol. 37, no. 6, pp. 512–526, 2004.
- C. Luque, J. M. Luna, M. Luque, and S. Ventura, “An advanced review on text mining in medicine,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 9, no. 3, p. e1302, 2019.
- P. N. Ahmad, A. M. Shah, and K. Lee, “A review on electronic health record text-mining for biomedical name entity recognition in healthcare domain,” in Healthcare, vol. 11, no. 9. MDPI, 2023, p. 1268.
- S. Eltyeb and N. Salim, “Chemical named entities recognition: a review on approaches and applications,” Journal of cheminformatics, vol. 6, no. 1, pp. 1–12, 2014.
- A. P. Quimbaya, A. S. Múnera, R. A. G. Rivera, J. C. D. Rodríguez, O. M. M. Velandia, A. A. G. Peña, and C. Labbé, “Named entity recognition over electronic health records through a combined dictionary-based approach,” Procedia Computer Science, vol. 100, pp. 55–61, 2016.
- L. Hirschman, A. A. Morgan, and A. S. Yeh, “Rutabaga by any other name: extracting biological names,” Journal of Biomedical Informatics, vol. 35, no. 4, pp. 247–259, 2002.
- Ö. Uzuner, B. R. South, S. Shen, and S. L. DuVall, “2010 i2b2/va challenge on concepts, assertions, and relations in clinical text,” Journal of the American Medical Informatics Association, vol. 18, no. 5, pp. 552–556, 2011.
- A. R. Aronson and F.-M. Lang, “An overview of metamap: historical perspective and recent advances,” Journal of the American Medical Informatics Association, vol. 17, no. 3, pp. 229–236, 2010.
- G. K. Savova, J. J. Masanz, P. V. Ogren, J. Zheng, S. Sohn, K. C. Kipper-Schuler, and C. G. Chute, “Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications,” Journal of the American Medical Informatics Association, vol. 17, no. 5, pp. 507–513, 2010.
- G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, “Neural architectures for named entity recognition,” arXiv preprint arXiv:1603.01360, 2016.
- J. P. Chiu and E. Nichols, “Named entity recognition with bidirectional LSTM-CNNs,” Transactions of the Association for Computational Linguistics, vol. 4, pp. 357–370, 2016.
- M. Habibi, L. Weber, M. Neves, D. L. Wiegandt, and U. Leser, “Deep learning with word embeddings improves biomedical named entity recognition,” Bioinformatics, vol. 33, no. 14, pp. i37–i48, 2017.
- S. Gehrmann, F. Dernoncourt, Y. Li, E. T. Carlson, J. T. Wu, J. Welt, J. Foote Jr, E. T. Moseley, D. W. Grant, P. D. Tyler et al., “Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives,” PloS one, vol. 13, no. 2, p. e0192360, 2018.
- A. Arbabi, D. R. Adams, S. Fidler, M. Brudno et al., “Identifying clinical terms in medical text using ontology-guided machine learning,” JMIR medical informatics, vol. 7, no. 2, p. e12596, 2019.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, 2017, pp. 5998–6008.
- R. Zhu, X. Tu, and J. X. Huang, “Utilizing BERT for biomedical and clinical text mining,” in Data Analytics in Biomedical Engineering and Healthcare. Elsevier, 2021, pp. 73–103.
- X. Yu, W. Hu, S. Lu, X. Sun, and Z. Yuan, “Biobert based named entity recognition in electronic medical record,” 2019 10th international conference on information technology in medicine and education (ITME), pp. 49–52, 2019.
- J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang, “Biobert: a pre-trained biomedical language representation model for biomedical text mining,” Bioinformatics, vol. 36, no. 4, pp. 1234–1240, 2020.
- Z. Ji, Q. Wei, and H. Xu, “Bert-based ranking for biomedical entity normalization,” AMIA Summits on Translational Science Proceedings, vol. 2020, p. 269, 2020.
- C. Yan, H. Ong, M. Grabowska, M. Krantz, W.-C. Su, A. Dickson, J. F. Peterson, Q. Feng, D. M. Roden, C. M. Stein et al., “Large language models facilitate the generation of electronic health record phenotyping algorithms,” medRxiv, pp. 2023–12, 2023.
- J. Yang, C. Liu, W. Deng, D. Wu, C. Weng, Y. Zhou, and K. Wang, “Enhancing phenotype recognition in clinical notes using large language models: Phenobcbert and phenogpt,” Patterns, 2023.
- A. Wang, C. Liu, J. Yang, and C. Weng, “Fine-tuning large language models for rare disease concept normalization,” bioRxiv, pp. 2023–12, 2023.
- M. Topaz, “Nimbleminer: A novel multi-lingual text mining application,” MEDINFO 2019: Health and Wellbeing e-Networks for All, pp. 1608–1609, 2019.
- OpenAI, “Chatgpt (4),” Large language model, 2024. [Online]. Available: https://chat.openai.com
- S. Velupillai, H. Dalianis, M. Hassel, and G. H. Nilsson, “Developing a standard for de-identifying electronic patient records written in swedish: precision, recall and f-measure in a manual and computerized annotation trial,” International journal of medical informatics, vol. 78, no. 12, pp. e19–e26, 2009.
- Wikipedia, “Precision and recall,” accessed: [January 29, 2024]. [Online]. Available: https://en.wikipedia.org/wiki/Precision_and_recall
- D. B. Hier, R. Yelugam, S. Azizi, M. D. Carrithers, and D. C. Wunsch II, “High throughput neurological phenotyping with MetaMap,” European Scientific Journal, vol. 18, pp. 37–49, 2022, accessed August 12, 2022. [Online]. Available: https://doi.org/10.190444/esj.2022.v18n4p37
- C. Oommen, Q. Howlett-Prieto, M. D. Carrithers, and D. B. Hier, “Inter-rater agreement for the annotation of neurologic signs and symptoms in electronic health records,” Frontiers in Digital Health, vol. 5, p. 1075771, 2023.
- J. A. Omiye, H. Gui, S. J. Rezaei, J. Zou, and R. Daneshjou, “Large language models in medicine: the potentials and pitfalls,” arXiv preprint arXiv:2309.00087, 2023.
- W. E. Thompson, D. M. Vidmar, J. K. De Freitas, J. M. Pfeifer, B. K. Fornwalt, R. Chen, G. Altay, K. Manghnani, A. C. Nelsen, K. Morland et al., “Large language models with retrieval-augmented generation for zero-shot disease phenotyping,” arXiv preprint arXiv:2312.06457, 2023.
- Syed I. Munzir (2 papers)
- Daniel B. Hier (7 papers)
- Michael D. Carrithers (4 papers)