Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Identifying and Extracting Rare Disease Phenotypes with Large Language Models (2306.12656v1)

Published 22 Jun 2023 in cs.CL and cs.AI

Abstract: Rare diseases (RDs) are collectively common and affect 300 million people worldwide. Accurate phenotyping is critical for informing diagnosis and treatment, but RD phenotypes are often embedded in unstructured text and time-consuming to extract manually. While NLP models can perform named entity recognition (NER) to automate extraction, a major bottleneck is the development of a large, annotated corpus for model training. Recently, prompt learning emerged as an NLP paradigm that can lead to more generalizable results without any (zero-shot) or few labeled samples (few-shot). Despite growing interest in ChatGPT, a revolutionary LLM capable of following complex human prompts and generating high-quality responses, none have studied its NER performance for RDs in the zero- and few-shot settings. To this end, we engineered novel prompts aimed at extracting RD phenotypes and, to the best of our knowledge, are the first the establish a benchmark for evaluating ChatGPT's performance in these settings. We compared its performance to the traditional fine-tuning approach and conducted an in-depth error analysis. Overall, fine-tuning BioClinicalBERT resulted in higher performance (F1 of 0.689) than ChatGPT (F1 of 0.472 and 0.591 in the zero- and few-shot settings, respectively). Despite this, ChatGPT achieved similar or higher accuracy for certain entities (i.e., rare diseases and signs) in the one-shot setting (F1 of 0.776 and 0.725). This suggests that with appropriate prompt engineering, ChatGPT has the potential to match or outperform fine-tuned LLMs for certain entity types with just one labeled sample. While the proliferation of LLMs may provide opportunities for supporting RD diagnosis and treatment, researchers and clinicians should critically evaluate model outputs and be well-informed of their limitations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Large language models are few-shot clinical information extractors. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1998–2022, 2022.
  2. Computable phenotype implementation for a national, multicenter pragmatic clinical trial: lessons learned from adaptable. Circulation: Cardiovascular Quality and Outcomes, 13(6):e006292, 2020.
  3. Publicly available clinical bert embeddings. arXiv preprint arXiv:1904.03323, 2019.
  4. “is it going to hurt?”: the impact of the diagnostic odyssey on children and their families. Journal of Genetic Counseling, 24:325–335, 2015.
  5. Using computable phenotypes in point-of-care clinical trial recruitment. In Public Health and Informatics-Proceedings of MIE 2021: Studies in Health Technology and Informatics, pages 560–564. IOS Press, 2021.
  6. Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations. arXiv preprint arXiv:2305.16326, 2023.
  7. Lightner: a lightweight tuning paradigm for low-resource ner via pluggable prompting. arXiv preprint arXiv:2109.00720, 2021.
  8. The therapeutic odyssey: Positioning genomic sequencing in the search for a child’s best possible life. AJOB Empirical Bioethics, 12(3):179–189, 2021.
  9. Rare disease emerging as a global public health priority. Frontiers in public health, 10:1028545, 2022.
  10. Quality of life in rare genetic conditions: a systematic review of the literature. American Journal of Medical Genetics Part A, 152(5):1136–1156, 2010.
  11. Template-based named entity recognition using bart. arXiv preprint arXiv:2106.01760, 2021.
  12. Automated extraction of clinical traits of multiple sclerosis in electronic medical records. Journal of the American Medical Informatics Association, 20(e2):e334–e340, 2013.
  13. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  14. Deep neural models for extracting entities and relationships in the new rdd corpus relating disabilities and rare diseases. Computer methods and programs in biomedicine, 164:121–129, 2018.
  15. Thinking about gpt-3 in-context learning for biomedical ie? think again. arXiv preprint arXiv:2203.08410, 2022.
  16. Zero-shot clinical entity recognition using chatgpt. arXiv preprint arXiv:2303.16416, 2023.
  17. NORD Rare Insights. Barriers to rare disease diagnosis, care and treatment in the us: a 30-year comparative analysis, 2020.
  18. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9, 2016.
  19. The ai revolution in medicine: Gpt-4 and beyond, 2023.
  20. Biomedical named entity recognition based on extended recurrent neural networks. In 2015 IEEE International Conference on bioinformatics and biomedicine (BIBM), pages 649–652. IEEE, 2015.
  21. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023.
  22. Improving early diagnosis of rare diseases using natural language processing in unstructured medical records: an illustration from dravet syndrome. Orphanet Journal of Rare Diseases, 16:1–12, 2021.
  23. Template-free prompt tuning for few-shot ner. arXiv preprint arXiv:2109.13532, 2021.
  24. The undiagnosed diseases program: Approach to diagnosis. Translational Science of Rare Diseases, 4(3-4):179–188, 2019.
  25. The raredis corpus: a corpus annotated with rare diseases, their signs and symptoms. Journal of Biomedical Informatics, 125:103961, 2022.
  26. Chatgpt as a medical doctor? a diagnostic accuracy study on common and rare diseases. medRxiv, pages 2023–04, 2023.
  27. Estimating cumulative point prevalence of rare diseases: analysis of the orphanet database. European Journal of Human Genetics, 28(2):165–173, 2020.
  28. Quantifying a rare disease in administrative data: the example of calciphylaxis. Journal of general internal medicine, 29:724–731, 2014.
  29. OpenAI. Introducing chatgpt. https://openai.com/blog/chatgpt, 2022.
  30. Named entity recognition using conditional random fields. Procedia Computer Science, 167:1181–1188, 2020.
  31. Improving language understanding by generative pre-training. 2018.
  32. Representation of rare diseases in health information systems: the orphanet approach to serve a wide range of end users. Human mutation, 33(5):803–808, 2012.
  33. Exploring deep learning methods for recognizing rare diseases and their clinical manifestations from texts. BMC bioinformatics, 23(1):263, 2022.
  34. spaCy. Industrial-strength natural language processing in python. https://spacy.io.
  35. Brat: a web-based tool for nlp-assisted text annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 102–107, 2012.
  36. Clinical prompt learning with frozen language models. arXiv preprint arXiv:2205.05535, 2022.
  37. The national institutes of health undiagnosed diseases program. Current opinion in pediatrics, 26(6):626, 2014.
  38. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  39. Clinical information extraction applications: a literature review. Journal of biomedical informatics, 77:34–49, 2018.
  40. A unified generative framework for various ner subtasks. arXiv preprint arXiv:2106.01223, 2021.
  41. The national economic burden of rare disease in the united states in 2019. Orphanet journal of rare diseases, 17(1):1–11, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Cathy Shyr (3 papers)
  2. Yan Hu (75 papers)
  3. Paul A. Harris (2 papers)
  4. Hua Xu (78 papers)
Citations (17)