Surpassing GPT-4 Medical Coding with a Two-Stage Approach (2311.13735v1)
Abstract: Recent advances in LLMs show potential for clinical applications, such as clinical decision support and trial recommendations. However, the GPT-4 LLM predicts an excessive number of ICD codes for medical coding tasks, leading to high recall but low precision. To tackle this challenge, we introduce LLM-codex, a two-stage approach to predict ICD codes that first generates evidence proposals using an LLM and then employs an LSTM-based verification stage. The LSTM learns from both the LLM's high recall and human expert's high precision, using a custom loss function. Our model is the only approach that simultaneously achieves state-of-the-art results in medical coding accuracy, accuracy on rare codes, and sentence-level evidence identification to support coding decisions without training on human-annotated evidence according to experiments on the MIMIC dataset.
- Large language models are few-shot clinical information extractors. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1998–2022, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. 10.18653/v1/2022.emnlp-main.130. URL https://aclanthology.org/2022.emnlp-main.130.
- Interpretable deep learning to map diagnostic texts to icd-10 codes. International journal of medical informatics, 129:49–59, 2019. URL https://www.sciencedirect.com/science/article/abs/pii/S1386505618310670?via%3Dihub.
- Improving in-context few-shot learning via self-supervised training. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3558–3573, Seattle, United States, July 2022. Association for Computational Linguistics. 10.18653/v1/2022.naacl-main.260. URL https://aclanthology.org/2022.naacl-main.260.
- MDACE: MIMIC documents annotated with code evidence. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7534–7550, Toronto, Canada, July 2023. Association for Computational Linguistics. 10.18653/v1/2023.acl-long.416. URL https://aclanthology.org/2023.acl-long.416.
- Simple and effective multi-paragraph reading comprehension. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 845–855, Melbourne, Australia, July 2018. Association for Computational Linguistics. 10.18653/v1/P18-1078. URL https://aclanthology.org/P18-1078.
- Screening for Social Determinants of Health: The Known and Unknown. JAMA, 322(11):1037–1038, 09 2019. ISSN 0098-7484. 10.1001/jama.2019.10915. URL https://doi.org/10.1001/jama.2019.10915.
- Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation. Journal of biomedical informatics, page 103728, 2020. URL https://api.semanticscholar.org/CorpusID:225103119.
- Rare disease identification from clinical notes with ontologies and weak supervision. 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 2294–2298, 2021. URL https://api.semanticscholar.org/CorpusID:233739818.
- Horses to zebras: Ontology-guided data augmentation and synthesis for ICD-9 coding. In Proceedings of the 21st Workshop on Biomedical Language Processing, pages 389–401, Dublin, Ireland, May 2022. Association for Computational Linguistics. 10.18653/v1/2022.bionlp-1.39. URL https://aclanthology.org/2022.bionlp-1.39.
- Reported lifetime aberrant drug-taking behaviors are predictive of current substance use and mental health problems in primary care patients. Pain medicine, 9 8:1098–106, 2008. URL https://api.semanticscholar.org/CorpusID:22904673.
- Self-verification improves few-shot clinical information extraction. ArXiv, abs/2306.00024, 2023. URL https://api.semanticscholar.org/CorpusID:258999642.
- Why do you think that? exploring faithful sentence-level rationales without supervision. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1080–1095, Online, November 2020. Association for Computational Linguistics. 10.18653/v1/2020.findings-emnlp.97. URL https://aclanthology.org/2020.findings-emnlp.97.
- International classification of diseases, tenth revision, clinical modification social determinants of health codes are poorly used in electronic health records. Medicine, 99, 2020. URL https://api.semanticscholar.org/CorpusID:229350312.
- Medalpaca–an open-source collection of medical conversational ai models and training data. arXiv preprint arXiv:2304.08247, 2023.
- Long short-term memory. Neural Computation, 9:1735–1780, 1997. URL https://api.semanticscholar.org/CorpusID:1915014.
- PLM-ICD: Automatic ICD coding with pretrained language models. In Proceedings of the 4th Clinical Natural Language Processing Workshop, pages 10–20, Seattle, WA, July 2022. Association for Computational Linguistics. 10.18653/v1/2022.clinicalnlp-1.2. URL https://aclanthology.org/2022.clinicalnlp-1.2.
- Fooling explanations in text classifiers. ArXiv, abs/2206.03178, 2022. URL https://api.semanticscholar.org/CorpusID:249431362.
- DARE: Towards robust text explanations in biomedical and healthcare applications. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11499–11533, Toronto, Canada, July 2023. Association for Computational Linguistics. 10.18653/v1/2023.acl-long.644. URL https://aclanthology.org/2023.acl-long.644.
- Attention is not Explanation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3543–3556, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. 10.18653/v1/N19-1357. URL https://aclanthology.org/N19-1357.
- Health system-scale language models are all-purpose prediction engines. Nature, 619:357 – 362, 2023. URL https://api.semanticscholar.org/CorpusID:259112211.
- Thinking about GPT-3 in-context learning for biomedical IE? think again. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4497–4512, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. 10.18653/v1/2022.findings-emnlp.329. URL https://aclanthology.org/2022.findings-emnlp.329.
- Mimic-iii, a freely accessible critical care database. Scientific Data, 3, 2016. URL https://api.semanticscholar.org/CorpusID:33285731.
- An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records. Artificial intelligence in medicine, 65 2:155–66, 2015.
- Can current explainability help provide references in clinical notes to support humans annotate medical codes? In Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI), pages 26–34, Abu Dhabi, United Arab Emirates (Hybrid), December 2022. Association for Computational Linguistics. 10.18653/v1/2022.louhi-1.3. URL https://aclanthology.org/2022.louhi-1.3.
- Odd: A benchmark dataset for the nlp-based opioid related aberrant behavior detection. ArXiv, abs/2307.02591, 2023. URL https://api.semanticscholar.org/CorpusID:259360903.
- Combining classifiers in text categorization. In SIGIR, page 289–297, 1996.
- Pretrained language models for biomedical and clinical tasks: Understanding and extending the state-of-the-art. In Proceedings of the 3rd Clinical Natural Language Processing Workshop, pages 146–157, Online, November 2020. Association for Computational Linguistics. 10.18653/v1/2020.clinicalnlp-1.17. URL https://aclanthology.org/2020.clinicalnlp-1.17.
- Oslat: Open set label attention transformer for medical entity retrieval and span extraction. In Antonio Parziale, Monica Agrawal, Shalmali Joshi, Irene Y. Chen, Shengpu Tang, Luis Oala, and Adarsh Subbaswamy, editors, Proceedings of the 2nd Machine Learning for Health symposium, volume 193 of Proceedings of Machine Learning Research, pages 373–390. PMLR, 28 Nov 2022. URL https://proceedings.mlr.press/v193/li22a.html.
- Lost in the middle: How language models use long contexts. ArXiv, abs/2307.03172, 2023. URL https://api.semanticscholar.org/CorpusID:259360665.
- Effective convolutional attention network for multi-label clinical document classification. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5941–5953, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. 10.18653/v1/2021.emnlp-main.481. URL https://aclanthology.org/2021.emnlp-main.481.
- Dynamically extracting outcome-specific problem lists from clinical notes with guided multi-headed attention. In Finale Doshi-Velez, Jim Fackler, Ken Jung, David Kale, Rajesh Ranganath, Byron Wallace, and Jenna Wiens, editors, Proceedings of the 5th Machine Learning for Healthcare Conference, volume 126 of Proceedings of Machine Learning Research, pages 245–270. PMLR, 07–08 Aug 2020. URL https://proceedings.mlr.press/v126/lovelace20a.html.
- Chill: Zero-shot custom interpretable feature extraction from clinical notes with large language models. ArXiv, abs/2302.12343, 2023. URL https://api.semanticscholar.org/CorpusID:257205986.
- Large language models as instructors: A study on multilingual clinical entity extraction. In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, pages 178–190, Toronto, Canada, July 2023. Association for Computational Linguistics. 10.18653/v1/2023.bionlp-1.15. URL https://aclanthology.org/2023.bionlp-1.15.
- ICDBigBird: A contextual embedding model for ICD code classification. In Proceedings of the 21st Workshop on Biomedical Language Processing, pages 330–336, Dublin, Ireland, May 2022. Association for Computational Linguistics. 10.18653/v1/2022.bionlp-1.32. URL https://aclanthology.org/2022.bionlp-1.32.
- A discrete hard EM approach for weakly supervised question answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2851–2864, Hong Kong, China, November 2019. Association for Computational Linguistics. 10.18653/v1/D19-1284. URL https://aclanthology.org/D19-1284.
- MetaICL: Learning to learn in context. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2791–2809, Seattle, United States, July 2022. Association for Computational Linguistics. 10.18653/v1/2022.naacl-main.201. URL https://aclanthology.org/2022.naacl-main.201.
- Associations between natural language processing–enriched social determinants of health and suicide death among us veterans. JAMA Network Open, 6, 2022. URL https://api.semanticscholar.org/CorpusID:254564500.
- Explainable prediction of medical codes from clinical text. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1101–1111, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. 10.18653/v1/N18-1100. URL https://aclanthology.org/N18-1100.
- Mimic-iv-icd: A new benchmark for extreme multilabel classification. ArXiv, abs/2304.13998, 2023a. URL https://api.semanticscholar.org/CorpusID:258352403.
- A two-stage decoder for efficient ICD coding. In Findings of the Association for Computational Linguistics: ACL 2023, pages 4658–4665, Toronto, Canada, July 2023b. Association for Computational Linguistics. 10.18653/v1/2023.findings-acl.285. URL https://aclanthology.org/2023.findings-acl.285.
- OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023. URL https://api.semanticscholar.org/CorpusID:257532815.
- Clinical practice guidelines for rare diseases: The orphanet database. PLoS ONE, 12, 2017.
- Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv preprint arXiv:1708.08296, 2017.
- Identifying and extracting rare disease phenotypes with large language models. ArXiv, abs/2306.12656, 2023. URL https://api.semanticscholar.org/CorpusID:259224453.
- Perturbing inputs for fragile interpretations in deep natural language processing. In BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, 2021. URL https://api.semanticscholar.org/CorpusID:236976089.
- Extend and explain: Interpreting very long language models. In Antonio Parziale, Monica Agrawal, Shalmali Joshi, Irene Y. Chen, Shengpu Tang, Luis Oala, and Adarsh Subbaswamy, editors, Proceedings of the 2nd Machine Learning for Health symposium, volume 193 of Proceedings of Machine Learning Research, pages 218–258. PMLR, 28 Nov 2022. URL https://proceedings.mlr.press/v193/stremmel22a.html.
- Clinical prompt learning with frozen language models. ArXiv, abs/2205.05535, 2022.
- Icd social codes: An underutilized resource for tracking social needs. Medical Care, 55:810–816, 2017. URL https://api.semanticscholar.org/CorpusID:13590045.
- Extracting medication information from clinical text. Journal of the American Medical Informatics Association : JAMIA, 17 5:514–8, 2010. URL https://api.semanticscholar.org/CorpusID:20264071.
- Estimating cumulative point prevalence of rare diseases: analysis of the orphanet database. European Journal of Human Genetics, 28:165 – 173, 2019.
- A novel framework based on medical concept driven attention for explainable medical code prediction via external knowledge. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1407–1416, Dublin, Ireland, May 2022. Association for Computational Linguistics. 10.18653/v1/2022.findings-acl.110. URL https://aclanthology.org/2022.findings-acl.110.
- Clinical information extraction applications: A literature review. Journal of biomedical informatics, 77:34–49, 2018. URL https://api.semanticscholar.org/CorpusID:3632923.
- Clinical text annotation - what factors are associated with the cost of time? AMIA … Annual Symposium proceedings. AMIA Symposium, 2018:1552–1560, 2018. URL https://api.semanticscholar.org/CorpusID:73482002.
- Large language models are better reasoners with self-verification. 2022. URL https://api.semanticscholar.org/CorpusID:258840837.
- Preparing medical imaging data for machine learning. Radiology, page 192224, 2020. URL https://api.semanticscholar.org/CorpusID:211160137.
- Multi-label few-shot icd coding as autoregressive generation with prompt. In AAAI Conference on Artificial Intelligence, 2022a. URL https://api.semanticscholar.org/CorpusID:254018044.
- Knowledge injected prompt based fine-tuning for multi-label few-shot ICD coding. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 1767–1781, Abu Dhabi, United Arab Emirates, December 2022b. Association for Computational Linguistics. 10.18653/v1/2022.findings-emnlp.127. URL https://aclanthology.org/2022.findings-emnlp.127.
- Automated identification of eviction status from electronic health record notes. Journal of the American Medical Informatics Association : JAMIA, 2022. URL https://api.semanticscholar.org/CorpusID:254275327.
- Code synonyms do matter: Multiple synonyms matching network for automatic ICD coding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 808–814, Dublin, Ireland, May 2022. Association for Computational Linguistics. 10.18653/v1/2022.acl-short.91. URL https://aclanthology.org/2022.acl-short.91.
- Using “annotator rationales” to improve machine learning for text categorization. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 260–267, Rochester, New York, April 2007. Association for Computational Linguistics. URL https://aclanthology.org/N07-1033.
- Automatic ICD coding exploiting discourse structure and reconciled code embeddings. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2883–2891, Gyeongju, Republic of Korea, October 2022. International Committee on Computational Linguistics. URL https://aclanthology.org/2022.coling-1.254.
- Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning, 2021. URL https://api.semanticscholar.org/CorpusID:231979430.
- Cancerbert: a cancer domain-specific language model for extracting breast cancer phenotypes from electronic health records. Journal of the American Medical Informatics Association : JAMIA, 29:1208 – 1216, 2022. URL https://api.semanticscholar.org/CorpusID:247677346.
- Frontiers of biomedical text mining: current progress. Briefings in bioinformatics, 8 5:358–75, 2007. URL https://api.semanticscholar.org/CorpusID:5689080.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.