Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhancing chest X-ray datasets with privacy-preserving large language models and multi-type annotations: a data-driven approach for improved classification (2403.04024v2)

Published 6 Mar 2024 in eess.IV and cs.CV

Abstract: In chest X-ray (CXR) image analysis, rule-based systems are usually employed to extract labels from reports for dataset releases. However, there is still room for improvement in label quality. These labelers typically output only presence labels, sometimes with binary uncertainty indicators, which limits their usefulness. Supervised deep learning models have also been developed for report labeling but lack adaptability, similar to rule-based systems. In this work, we present MAPLEZ (Medical report Annotations with Privacy-preserving LLM using Expeditious Zero shot answers), a novel approach leveraging a locally executable LLM to extract and enhance findings labels on CXR reports. MAPLEZ extracts not only binary labels indicating the presence or absence of a finding but also the location, severity, and radiologists' uncertainty about the finding. Over eight abnormalities from five test sets, we show that our method can extract these annotations with an increase of 3.6 percentage points (pp) in macro F1 score for categorical presence annotations and more than 20 pp increase in F1 score for the location annotations over competing labelers. Additionally, using the combination of improved annotations and multi-type annotations in classification supervision, we demonstrate substantial advancements in model quality, with an increase of 1.1 pp in AUROC over models trained with annotations from the best alternative approach. We share code and annotations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Leveraging gpt-4 for post hoc transformation of free-text radiology reports into structured reporting: A multilingual feasibility study. Radiology 307, e230725. URL: https://doi.org/10.1148/radiol.230725, doi:10.1148/radiol.230725, arXiv:https://doi.org/10.1148/radiol.230725. pMID: 37093751.
  2. REFLACX: Reports and eye-tracking data for localization of abnormalities in chest x-rays. URL: https://physionet.org/content/reflacx-xray-localization/1.0.0/, doi:10.13026/E0DJ-8498.
  3. Reflacx, a dataset of reports and eye-tracking data for localization of abnormalities in chest x-rays. Scientific Data 9, 350. doi:10.1038/s41597-022-01441-z.
  4. Automatic image classification using labels from radiology text reports: predicting deauville scores. Journal of Nuclear Medicine 61, 1410--1410. URL: https://jnm.snmjournals.org/content/61/supplement_1/1410, arXiv:https://jnm.snmjournals.org/content.
  5. Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports. Bioinformatics 36, 5255--5261. URL: https://doi.org/10.1093/bioinformatics/btaa668, doi:10.1093/bioinformatics/btaa668.
  6. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. URL: https://lmsys.org/blog/2023-03-30-vicuna/.
  7. On the limits of cross-domain generalization in automated x-ray prediction, in: Arbel, T., Ayed, I.B., de Bruijne, M., Descoteaux, M., Lombaert, H., Pal, C. (Eds.), International Conference on Medical Imaging with Deep Learning, MIDL 2020, 6-8 July 2020, Montréal, QC, Canada, PMLR. pp. 136--155. URL: http://proceedings.mlr.press/v121/cohen20a.html.
  8. Autoaugment: Learning augmentation strategies from data, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, Computer Vision Foundation / IEEE. pp. 113--123. URL: http://openaccess.thecvf.com/content_CVPR_2019/html/Cubuk_AutoAugment_Learning_Augmentation_Strategies_From_Data_CVPR_2019_paper.html, doi:10.1109/CVPR.2019.00020.
  9. Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning. BMC Medical Informatics Decis. Mak. 22, 102. URL: https://doi.org/10.1186/s12911-022-01843-4, doi:10.1186/S12911-022-01843-4.
  10. Fast-Track Action Committee on Advancing Privacy-Preserving Data Sharing and Analytics, Networking and Information Technology Research and Development Subcommittee, of the National Strategy to Advance Privacy-Preserving Data Sharing and Analytics. Technical Report. National Science and Technology Council.
  11. Machine-learning-based multiple abnormality prediction with large-scale chest computed tomography volumes. Medical Image Anal. 67, 101857. URL: https://doi.org/10.1016/j.media.2020.101857, doi:10.1016/J.MEDIA.2020.101857.
  12. llama.cpp. URL: https://github.com/ggerganov/llama.cpp. online. Accessed on February 29, 2024.
  13. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 101, e215--e220. doi:10.1161/01.CIR.101.23.e215.
  14. Not a cute stroke: Analysis of rule- and neural network-based information extraction systems for brain radiology reports, in: Holderness, E., Jimeno-Yepes, A., Lavelli, A., Minard, A., Pustejovsky, J., Rinaldi, F. (Eds.), Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis, LOUHI@EMNLP 2020, Online, November 20, 2020, Association for Computational Linguistics. pp. 24--37. URL: https://doi.org/10.18653/v1/2020.louhi-1.4, doi:10.18653/V1/2020.LOUHI-1.4.
  15. Chex-gpt: Harnessing large language models for enhanced chest x-ray report labeling. CoRR abs/2401.11505. URL: https://doi.org/10.48550/arXiv.2401.11505, doi:10.48550/ARXIV.2401.11505, arXiv:2401.11505.
  16. Detection of pneumothorax with deep learning models: Learning from radiologist labels vs natural language processing model generated labels. Academic Radiology 29, 1350--1358. URL: https://www.sciencedirect.com/science/article/pii/S107663322100427X, doi:https://doi.org/10.1016/j.acra.2021.09.013.
  17. Augmix: A simple data processing method to improve robustness and uncertainty, in: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, OpenReview.net. URL: https://openreview.net/forum?id=S1gmrxHFvB.
  18. Expert knowledge-aware image difference graph representation learning for difference-aware medical visual question answering, in: Singh, A.K., Sun, Y., Akoglu, L., Gunopulos, D., Yan, X., Kumar, R., Ozcan, F., Ye, J. (Eds.), Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023, Long Beach, CA, USA, August 6-10, 2023, ACM. pp. 4156--4165. URL: https://doi.org/10.1145/3580305.3599819, doi:10.1145/3580305.3599819.
  19. Labeling noncontrast head ct reports for common findings using natural language processing. American Journal of Neuroradiology 43, 721--726. URL: https://www.ajnr.org/content/43/5/721, doi:10.3174/ajnr.A7500, arXiv:https://www.ajnr.org/content/43/5/721.full.pdf.
  20. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison, in: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, AAAI Press. pp. 590--597. URL: https://doi.org/10.1609/aaai.v33i01.3301590, doi:10.1609/AAAI.V33I01.3301590.
  21. chexpert-labeler. https://github.com/stanfordmlgroup/chexpert-labeler/tree/44ddeb363149aa657296237f18b5472a73c1756f/phrases/mention.
  22. MIMIC-CXR-JPG - chest radiographs with structured labels (version 2.0.0). URL: https://physionet.org/content/mimic-cxr-jpg/2.0.0/, doi:10.13026/8360-t248.
  23. MIMIC-CXR database (version 2.0.0). URL: https://physionet.org/content/mimic-cxr/2.0.0/, doi:10.13026/C2JT1Q.
  24. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific Data 6, 317. doi:https://doi.org/10.1038/s41597-019-0322-0.
  25. MIMIC-CXR-JPG: A large publicly available database of labeled chest radiographs. CoRR [Preprint] abs/1901.07042. URL: https://arxiv.org/abs/1901.07042, arXiv:1901.07042.
  26. Using an open-source language model to abstract the presence of acute cervical spine fracture from radiologic reports: A hipaa compliant alternative to "chatgpt". Conference on Machine Intelligence in Medical Imaging. URL: https://siim.org/wp-content/uploads/2023/08/using_an_open-source_languag.pdf.
  27. Large language models are zero-shot reasoners, in: NeurIPS. URL: http://papers.nips.cc/paper_files/paper/2022/hash/8bb0d291acd4acf06ef112099c16f326-Abstract-Conference.html.
  28. Efficient memory management for large language model serving with pagedattention, in: Flinn, J., Seltzer, M.I., Druschel, P., Kaufmann, A., Mace, J. (Eds.), Proceedings of the 29th Symposium on Operating Systems Principles, SOSP 2023, Koblenz, Germany, October 23-26, 2023, ACM. pp. 611--626. URL: https://doi.org/10.1145/3600006.3613165, doi:10.1145/3600006.3613165.
  29. Openorca: An open dataset of gpt augmented flan reasoning traces. https://https://huggingface.co/Open-Orca/OpenOrca.
  30. Exploring the boundaries of GPT-4 in radiology. CoRR abs/2310.14573. URL: https://doi.org/10.48550/arXiv.2310.14573, doi:10.48550/ARXIV.2310.14573, arXiv:2310.14573.
  31. The flan collection: Designing data and methods for effective instruction tuning. arXiv:2301.13688.
  32. Feasibility of using the privacy-preserving large language model vicuna for labeling radiology reports. Radiology 309, e231147. URL: https://doi.org/10.1148/radiol.231147, doi:10.1148/radiol.231147, arXiv:https://doi.org/10.1148/radiol.231147. pMID: 37815442.
  33. Orca: Progressive learning from complex explanation traces of gpt-4. arXiv:2306.02707.
  34. Trivialaugment: Tuning-free yet state-of-the-art data augmentation, in: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, IEEE. pp. 754--762. URL: https://doi.org/10.1109/ICCV48922.2021.00081, doi:10.1109/ICCV48922.2021.00081.
  35. OpenAI, 2023. GPT-4 technical report. CoRR abs/2303.08774. URL: https://doi.org/10.48550/arXiv.2303.08774, doi:10.48550/ARXIV.2303.08774, arXiv:2303.08774.
  36. Pytorch: An imperative style, high-performance deep learning library, in: Advances in Neural Information Processing Systems 32. Curran Associates, Inc., pp. 8024--8035. URL: http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf.
  37. PhysioNet , 2023. Responsible use of mimic data with online services like gpt. https://physionet.org/news/post/415. URL: https://physionet.org/news/post/415.
  38. Paying per-label attention for multi-label extraction from radiology reports, in: Cardoso, J.S., Nguyen, H.V., Heller, N., Abreu, P.H., Isgum, I., Silva, W., Cruz, R.P.M., Amorim, J.P., Patel, V., Roysam, B., Zhou, S.K., Jiang, S.B., Le, N., Luu, K., Sznitman, R., Cheplygina, V., Mateus, D., Trucco, E., Abbasi-Sureshjani, S. (Eds.), Interpretable and Annotation-Efficient Learning for Medical Image Computing - Third International Workshop, iMIMIC 2020, Second International Workshop, MIL3ID 2020, and 5th International Workshop, LABELS 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4-8, 2020, Proceedings, Springer. pp. 277--289. URL: https://doi.org/10.1007/978-3-030-61166-8_29, doi:10.1007/978-3-030-61166-8_29.
  39. Templated text synthesis for expert-guided multi-label extraction from radiology reports. Mach. Learn. Knowl. Extr. 3, 299--317. URL: https://doi.org/10.3390/make3020015, doi:10.3390/MAKE3020015.
  40. Augmenting the national institutes of health chest radiograph dataset with expert annotations of possible pneumonia. Radiology: Artificial Intelligence 1, e180041. URL: https://doi.org/10.1148/ryai.2019180041, doi:10.1148/ryai.2019180041, arXiv:https://doi.org/10.1148/ryai.2019180041. pMID: 33937785.
  41. Interleaved text/image deep mining on a large-scale radiology database for automated image interpretation. Journal of Machine Learning Research 17, 1--31. URL: http://jmlr.org/papers/v17/15-176.html.
  42. Radiologist preferences, agreement, and variability in phrases used to convey diagnostic certainty in radiology reports. Journal of the American College of Radiology 16, 458--464. URL: https://www.sciencedirect.com/science/article/pii/S1546144018312845, doi:https://doi.org/10.1016/j.jacr.2018.09.052.
  43. Combining automatic labelers and expert annotations for accurate radiology report labeling using BERT, in: Webber, B., Cohn, T., He, Y., Liu, Y. (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online. pp. 1500--1519. URL: https://aclanthology.org/2020.emnlp-main.117, doi:10.18653/v1/2020.emnlp-main.117.
  44. Deep reinforcement learning with automated label extraction from clinical reports accurately classifies 3d MRI brain volumes. J. Digit. Imaging 35, 1143--1152. URL: https://doi.org/10.1007/s10278-022-00644-5, doi:10.1007/S10278-022-00644-5.
  45. Rethinking the inception architecture for computer vision, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, IEEE Computer Society. pp. 2818--2826. URL: https://doi.org/10.1109/CVPR.2016.308, doi:10.1109/CVPR.2016.308.
  46. Efficientnetv2: Smaller models and faster training, in: Meila, M., Zhang, T. (Eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, PMLR. pp. 10096--10106. URL: http://proceedings.mlr.press/v139/tan21a.html.
  47. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  48. Automated deep-neural-network surveillance of cranial images for acute neurologic events. Nature Medicine 24, 1337--1341. URL: https://doi.org/10.1038/s41591-018-0147-y, doi:10.1038/s41591-018-0147-y.
  49. Expert-level detection of pathologies from unannotated chest x-ray images via self-supervised learning. Nature Biomedical Engineering 6, 1--8. doi:10.1038/s41551-022-00936-9.
  50. Llama 2: Open foundation and fine-tuned chat models. CoRR abs/2307.09288. URL: https://doi.org/10.48550/arXiv.2307.09288, doi:10.48550/ARXIV.2307.09288, arXiv:2307.09288.
  51. Upstage, 2023. Solar-0-70b-16bit. https://huggingface.co/upstage/SOLAR-0-70b-16bit. URL: https://huggingface.co/upstage/SOLAR-0-70b-16bit.
  52. How to train state-of-the-art models using torchvision’s latest primitives. https://pytorch.org/blog/how-to-train-state-of-the-art-models-using-torchvision-latest-primitives/. URL: https://pytorch.org/blog/how-to-train-state-of-the-art-models-using-torchvision-latest-primitives/.
  53. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, IEEE Computer Society. pp. 3462--3471. URL: https://doi.org/10.1109/CVPR.2017.369, doi:10.1109/CVPR.2017.369.
  54. Transformers: State-of-the-art natural language processing, in: Liu, Q., Schlangen, D. (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, EMNLP 2020 - Demos, Online, November 16-20, 2020, Association for Computational Linguistics. pp. 38--45. URL: https://doi.org/10.18653/v1/2020.emnlp-demos.6, doi:10.18653/V1/2020.EMNLP-DEMOS.6.
  55. Deep learning to automate the labelling of head mri datasets for computer vision applications. European Radiology 32, 725--736. URL: https://doi.org/10.1007/s00330-021-08132-0, doi:10.1007/s00330-021-08132-0.
  56. Automated labelling using an attention model for radiology reports of MRI scans (ALARM), in: Arbel, T., Ayed, I.B., de Bruijne, M., Descoteaux, M., Lombaert, H., Pal, C. (Eds.), International Conference on Medical Imaging with Deep Learning, MIDL 2020, 6-8 July 2020, Montréal, QC, Canada, PMLR. pp. 811--826. URL: http://proceedings.mlr.press/v121/wood20a.html.
  57. Holistic and comprehensive annotation of clinically significant findings on diverse CT images: Learning from radiology reports and label ontology, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, Computer Vision Foundation / IEEE. pp. 8523--8532. URL: http://openaccess.thecvf.com/content_CVPR_2019/html/Yan_Holistic_and_Comprehensive_Annotation_of_Clinically_Significant_Findings_on_Diverse_CVPR_2019_paper.html, doi:10.1109/CVPR.2019.00872.
  58. Cutmix: Regularization strategy to train strong classifiers with localizable features, in: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, IEEE. pp. 6022--6031. URL: https://doi.org/10.1109/ICCV.2019.00612, doi:10.1109/ICCV.2019.00612.
  59. Natural language–based machine learning models for the annotation of clinical radiology reports. Radiology 287, 570--580. URL: https://doi.org/10.1148/radiol.2018171093, doi:10.1148/radiol.2018171093, arXiv:https://doi.org/10.1148/radiol.2018171093. pMID: 29381109.
  60. mixup: Beyond empirical risk minimization, in: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, OpenReview.net. URL: https://openreview.net/forum?id=r1Ddp1-Rb.
  61. Expert uncertainty and severity aware chest x-ray classification by multi-relationship graph learning. CoRR abs/2309.03331. URL: https://doi.org/10.48550/arXiv.2309.03331, doi:10.48550/ARXIV.2309.03331, arXiv:2309.03331.
  62. Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv:2306.05685.
  63. Random erasing data augmentation, in: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, AAAI Press. pp. 13001--13008. URL: https://doi.org/10.1609/aaai.v34i07.7000, doi:10.1609/AAAI.V34I07.7000.
Citations (1)

Summary

We haven't generated a summary for this paper yet.