Prompt-based Extraction of Social Determinants of Health Using Few-shot Learning (2306.07170v1)
Abstract: Social determinants of health (SDOH) documented in the electronic health record through unstructured text are increasingly being studied to understand how SDOH impacts patient health outcomes. In this work, we utilize the Social History Annotation Corpus (SHAC), a multi-institutional corpus of de-identified social history sections annotated for SDOH, including substance use, employment, and living status information. We explore the automatic extraction of SDOH information with SHAC in both standoff and inline annotation formats using GPT-4 in a one-shot prompting setting. We compare GPT-4 extraction performance with a high-performing supervised approach and perform thorough error analyses. Our prompt-based GPT-4 method achieved an overall 0.652 F1 on the SHAC test set, similar to the 7th best-performing system among all teams in the n2c2 challenge with SHAC.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Centers for Disease Control and Prevention. 2022. Social determinants of health at CDC.
- Addressing social determinants to improve patient care and promote health equity: An american college of physicians position paper. Annals of internal medicine, 168(8):577–578.
- Markus Eberts and Adrian Ulges. 2020. Span-based joint entity and relation extraction with transformer pre-training. In 24th European Conference on Artificial Intelligence.
- Nicole L Friedman and Matthew P Banegas. 2018. Toward addressing social determinants of health: a health care system strategy. The Permanente journal, 22.
- Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing. Journal of Biomedical Informatics, 127:103984.
- Assessing the availability of data on social and behavioral determinants in structured and unstructured electronic health records: A retrospective analysis of a multilevel health care system. Journal of Medical Internet Research, 7.
- Zero-shot Clinical Entity Recognition using ChatGPT. arXiv preprint arXiv:2303.16416.
- MIMIC-III, a freely accessible critical care database. Scientific Data, 3:160035.
- Can large language models reason about medical questions? arXiv preprint arXiv:2207.08143.
- Deid-gpt: Zero-shot medical text de-identification by gpt-4. arXiv preprint arXiv:2303.11032.
- Leveraging Natural Language Processing to Augment Structured Social Determinants of Health Data in the Electronic Health Record. arXiv preprint arXiv:2212.07538.
- Annotating social determinants of health using active learning, and characterizing determinants using neural event extraction. Journal of biomedical informatics, 113:103631.
- The 2022 n2c2/uw shared task on extracting social determinants of health. Journal of the American Medical Informatics Association.
- OpenAI. 2023. Gpt-4 technical report.
- Training language models to follow instructions with human feedback.
- Extracting social determinants of health from electronic health records using natural language processing: A systematic review. Journal of the American Medical Informatics Association, pages 1–12.
- Scifive: a text-to-text transformer model for biomedical literature.
- Extracting social determinants of health from clinical note text with classification and sequence-to-sequence approaches. Journal of the American Medical Informatics Association. Ocad071.
- Social determinants of health in the united states: addressing major health inequality trends for the nation, 1935-2016. International journal of MCH and AIDS, 6(2):139.
- Large language models encode clinical knowledge.
- Identification of social determinants of health using multi-label classification of electronic health record clinical notes. American Medical Informatics Association, issuing body, 4.
- brat: a web-based tool for NLP-assisted text annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 102–107, Avignon, France. Association for Computational Linguistics.
- Identifying patient smoking status from medical discharge records. American Medical Informatics Association, 15:14–24.
- Chain-of-thought prompting elicits reasoning in large language models.
- Meliha Yetisgen and Lucy Vanderwende. 2017. Automatic identification of substance abuse from social history in clinical text. In , pages 171–181.
- A study of social and behavioral determinants of health in lung cancer patients using transformers-based natural language processing models. AMIA Annual Symposium proceedings. AMIA Symposium, 2021:1225–1233.
- Giridhar Kaushik Ramachandran (8 papers)
- Yujuan Fu (6 papers)
- Bin Han (148 papers)
- Kevin Lybarger (19 papers)
- Nicholas J Dobbins (8 papers)
- Özlem Uzuner (39 papers)
- Meliha Yetisgen (31 papers)