Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4 (2303.11032v2)

Published 20 Mar 2023 in cs.CL and cs.CY

Abstract: The digitization of healthcare has facilitated the sharing and re-using of medical data but has also raised concerns about confidentiality and privacy. HIPAA (Health Insurance Portability and Accountability Act) mandates removing re-identifying information before the dissemination of medical records. Thus, effective and efficient solutions for de-identifying medical data, especially those in free-text forms, are highly needed. While various computer-assisted de-identification methods, including both rule-based and learning-based, have been developed and used in prior practice, such solutions still lack generalizability or need to be fine-tuned according to different scenarios, significantly imposing restrictions in wider use. The advancement of LLMs (LLM), such as ChatGPT and GPT-4, have shown great potential in processing text data in the medical domain with zero-shot in-context learning, especially in the task of privacy protection, as these models can identify confidential information by their powerful named entity recognition (NER) capability. In this work, we developed a novel GPT4-enabled de-identification framework (``DeID-GPT") to automatically identify and remove the identifying information. Compared to existing commonly used medical text data de-identification methods, our developed DeID-GPT showed the highest accuracy and remarkable reliability in masking private information from the unstructured medical text while preserving the original structure and meaning of the text. This study is one of the earliest to utilize ChatGPT and GPT-4 for medical text data processing and de-identification, which provides insights for further research and solution development on the use of LLMs such as ChatGPT/GPT-4 in healthcare. Codes and benchmarking data information are available at https://github.com/yhydhx/ChatGPT-API.

DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4

The paper "DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4" addresses the pressing need for effective de-identification techniques in the digitized healthcare domain, especially pertaining to free-form clinical text. In light of HIPAA requirements, which stipulate the removal of identifiable patient information from medical records, this research explores leveraging LLMs, specifically GPT-4, for zero-shot de-identification tasks.

Context and Innovation

The advent of electronic health records (EHR) has facilitated significant advancements in the medical field through data sharing and application of data-driven solutions. However, this digital transformation also brings heightened concerns over patient privacy and confidentiality. Previous efforts have primarily focused on rule-based and learning-based methods for de-identification, but these approaches often lack generalizability across different datasets and require extensive fine-tuning.

Recent developments in LLMs, notably GPT-4, present a novel opportunity for medical text processing through zero-shot and in-context learning capabilities. These models, powered by advanced named entity recognition (NER), can potentially identify and redact sensitive information efficiently without requiring large-scale labeled data or manual interventions.

Methodology

The paper introduces the DeID-GPT framework, which is built upon GPT-4's capabilities to automatically identify and remove identifying information in clinical text. The approach centers around prompt engineering—designing tailored prompts embedded with HIPAA identifiers to guide the model in recognizing and redacting protected health information (PHI).

The methodology comprises two key steps:

  1. Prompt Design: Tailoring prompts that incorporate HIPAA guidelines, enabling the LLM to understand the specific information requiring redaction.
  2. Processing through GPT-4: Utilizing GPT-4 for the actual de-identification process, wherein both the prompt and original clinical text are input to generate anonymized outputs.

Experimental Results

The authors conducted an extensive evaluation of DeID-GPT using the i2b2/UTHealth de-identification challenge dataset. The findings demonstrate that GPT-4 outperforms existing de-identification methods, including BERT, RoBERTa, and ClinicalBERT, achieving superior accuracy in redacting sensitive information. Specifically, GPT-4 achieved an accuracy rate exceeding 99% in zero-shot prompts, showcasing its robust capability in handling de-identification tasks without requiring explicit training or fine-tuning.

Implications and Future Directions

The implications of this research are substantial for both theoretical exploration and practical applications in AI and healthcare. The introduction of GPT-4 for de-identification tasks has potential advantages, such as:

  • Scale and Efficiency: Rapid processing of large datasets, reducing time and resources compared to manual and rule-based methods.
  • Adaptability: Seamless application across varied medical text datasets without necessitating changes in the workflow.
  • Reduction of Annotation Efforts: Minimizing the need for large-scale annotated data, which is often a bottleneck in clinical NLP tasks.

Moving forward, several avenues present opportunities for further exploration:

  • Locally-Deployed Models: Developing open-source LLMs suitable for local deployment within healthcare institutions to ensure data privacy and compliance with regulations.
  • Domain-Specific Enhancement: Refining LLMs with domain-specific data (such as clinical notes) could augment model performance and adaptability.
  • Fine-Tuning and Integration: Investigating fine-tuning techniques for LLMs, especially GPT-4, to optimize their capabilities for specific healthcare domains.

In summary, the DeID-GPT framework signifies promising progress in leveraging LLMs for automated medical text de-identification, contributing valuable insights into the broader application of AI models in safeguarding patient privacy within the healthcare sector.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (110)
  1. Blockchain’s coming to hospital to digitalize healthcare services: Designing a distributed electronic health record ecosystem. Technovation. 2023.
  2. Oral antibiotic use and early-onset colorectal cancer: findings from a case-control study using a national clinical database. British Journal of Cancer. 2022;126(6):957-67.
  3. Does Synthetic Data Generation of LLMs Help Clinical Text Mining? arXiv preprint arXiv:230304360. 2023.
  4. Digitization of healthcare sector: A study on privacy and security concerns. ICT Express. 2023.
  5. Natural Language Processing for Enterprise-scale De-identification of Protected Health Information in Clinical Notes. In: AMIA Annual Symposium Proceedings. vol. 2022. American Medical Informatics Association; 2022. p. 92.
  6. Challenges and opportunities beyond structured data in analysis of electronic health records. Wiley Interdisciplinary Reviews: Computational Statistics. 2021;13(6):e1549.
  7. De-identification of electronic health record using neural network. Scientific reports. 2020;1:1-11.
  8. Bio-signal data sharing security through watermarking: a technical survey. Computing. 2021:1-35.
  9. Privacy protection of patient medical images using digital watermarking technique for E-healthcare system. Current Medical Imaging. 2019;8:802-9.
  10. Lamda: Language models for dialog applications. arXiv preprint arXiv:220108239. 2022.
  11. ChatAug: Leveraging ChatGPT for Text Data Augmentation. arXiv preprint arXiv:230213007. 2023.
  12. Artificial general intelligence for radiation oncology. Meta-Radiology. 2023:100045.
  13. Artificial General Intelligence for Medical Imaging. arXiv preprint arXiv:230605480. 2023.
  14. Radonc-gpt: A large language model for radiation oncology. arXiv preprint arXiv:230910160. 2023.
  15. Evaluating large language models for radiology natural language processing. arXiv preprint arXiv:230713693. 2023.
  16. Summary of chatgpt-related research and perspective towards the future of large language models. Meta-Radiology. 2023:100017.
  17. Evaluating large language models on a highly-specialized topic, radiation oncology physics. arXiv preprint arXiv:230401938. 2023.
  18. Training language models to follow instructions with human feedback. arXiv preprint arXiv:220302155. 2022.
  19. Chatradio-valuer: A chat large language model for generalizable radiology report generation based on multi-institution and multi-system data. arXiv preprint arXiv:231005242. 2023.
  20. Tailoring Large Language Models to Radiology: A Preliminary Approach to LLM Adaptation for a Highly Specialized Domain. In: International Workshop on Machine Learning in Medical Imaging. Springer; 2023. p. 464-73.
  21. Radiology-llama2: Best-in-class large language model for radiology. arXiv preprint arXiv:230906419. 2023.
  22. Radiology-GPT: A Large Language Model for Radiology. arXiv preprint arXiv:230608666. 2023.
  23. Holistic Evaluation of GPT-4V for Biomedical Imaging. arXiv preprint arXiv:231205256. 2023.
  24. Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V. arXiv preprint arXiv:231019061. 2023.
  25. Large language models are zero-shot reasoners. arXiv preprint arXiv:220511916. 2022.
  26. Large language models are zero-shot clinical information extractors. arXiv preprint arXiv:220512689. 2022.
  27. Selective annotation makes language models better few-shot learners. arXiv preprint arXiv:220901975. 2022.
  28. Mededit: Model editing for medical question answering with external knowledge bases. arXiv preprint arXiv:230916035. 2023.
  29. Cohortgpt: An enhanced gpt for participant recruitment in clinical study. arXiv preprint arXiv:230711346. 2023.
  30. ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports. arXiv preprint arXiv:221214882. 2022.
  31. Som Biswas. ChatGPT and the future of medical writing. Radiology. 2023:223312.
  32. ImpressionGPT: an iterative optimizing framework for radiology report summarization with chatGPT. arXiv preprint arXiv:230408448. 2023.
  33. Carmel Shachar. HIPAA, privacy, and reproductive rights in a post-Roe era. JAMA. 2022;5:417-8.
  34. Is ChatGPT a General-Purpose Natural Language Processing Task Solver? arXiv preprint arXiv:230206476. 2023.
  35. Evaluating the performance of chatgpt in ophthalmology: An analysis of its successes and shortcomings. medRxiv (2023): 2023-01. 2023.
  36. ChatGPT: Exploring the Role of Cybersecurity in the Protection of Medical Information. Mesopotamian journal of cybersecurity. 2023:18-21.
  37. Zero-Shot Information Extraction via Chatting with ChatGPT. arXiv preprint arXiv:230210205. 2023.
  38. A neural probabilistic language model. Advances in neural information processing systems. 2000;13.
  39. Recurrent neural network based language model. In: Interspeech. vol. 2. Makuhari; 2010. p. 1045-8.
  40. Graves A, Graves A. Long short-term memory. Supervised sequence labelling with recurrent neural networks. 2012:37-45.
  41. Dey R, Salem FM. Gate-variants of gated recurrent unit (GRU) neural networks. In: 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS). IEEE; 2017. p. 1597-600.
  42. Sharp nearby, fuzzy far away: How neural language models use context. arXiv preprint arXiv:180504623. 2018.
  43. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018.
  44. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:190711692. 2019.
  45. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877-901.
  46. Improving language understanding by generative pre-training. 2018.
  47. Language models are unsupervised multitask learners. OpenAI blog. 2019;1(8):9.
  48. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:220501068. 2022.
  49. A mathematical framework for transformer circuits. Transformer Circuits Thread. 2021.
  50. Pre-trained models for natural language processing: A survey. Science China Technological Sciences. 2020;63(10):1872-97.
  51. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research. 2020;21(1):5485-551.
  52. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:191013461. 2019.
  53. Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model. arXiv preprint arXiv:220111990. 2022.
  54. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:220402311. 2022.
  55. The bigscience roots corpus: A 1.6 tb composite multilingual dataset. arXiv preprint arXiv:230303915. 2023.
  56. Jurassic-1: Technical details and evaluation. White Paper AI21 Labs 1. 2021.
  57. BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks. arXiv preprint arXiv:230517100. 2023.
  58. Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta Optimizers. arXiv preprint arXiv:221210559. 2022.
  59. Architecting the Future of Software Engineering. Computer. 2022;55(9):89-93.
  60. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1-35.
  61. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:230211382. 2023.
  62. GPT understands, too. arXiv preprint arXiv:210310385. 2021.
  63. Schick T, Schütze H. It’s not just size that matters: Small language models are also few-shot learners. arXiv preprint arXiv:200907118. 2020.
  64. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:201015980. 2020.
  65. Downstream task performance of bert models pre-trained using automatically de-identified clinical data. Proceedings of the Thirteenth Language Resources and Evaluation Conference. 2022:4245-52.
  66. Which anonymization technique is best for which NLP task?–It depends. A Systematic Study on Clinical Text Processing. arXiv preprint arXiv:220900262. 2022.
  67. De-identification algorithm for free-text nursing notes. Computers in Cardiology. 2005:331-4.
  68. Automated de-identification of free-text medical records. BMC medical informatics and decision making. 2008;1:1-17.
  69. Hercules Dalianis. Pseudonymisation of Swedish electronic patient records using a rule-based approach. Proceedings of the Workshop on NLP and Pseudonymisation. 2019;166:16-23.
  70. The Impact of De-identification on Downstream Named Entity Recognition in Clinical Text. Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis. 2012:1-11.
  71. Hui Yang JMG. Automatic detection of protected health information from clinic narratives. Journal of biomedical informatics. 2015:S30-8.
  72. Survey on RNN and CRF models for de-identification of medical free text. Journal of Big Data. 2020;7:1-22.
  73. Verification of De-Identification Techniques for Personal Information Using Tree-Based Methods with Shapley Values. Journal of Personalized Medicine. 2022;2:190.
  74. Natural Language Processing for Enterprise-scale De-identification of Protected Health Information in Clinical Notes. AMIA Annual Symposium Proceedings. 2022;2022:92.
  75. A study of deep learning methods for de-identification of clinical notes in cross-institute settings. BMC medical informatics and decision making. 2019;5:1-9.
  76. Crosslingual named entity recognition for clinical de-identification applied to a COVID-19 Italian data set. Applied soft computing. 2020;97:106779.
  77. DeIDNER Model: A Neural Network Named Entity Recognition Model for Use in the De-identification of Clinical Notes. In: Biomedical engineering systems and technologies, international joint conference, BIOSTEC… revised selected papers. BIOSTEC (Conference). vol. 5. NIH Public Access; 2022. p. 640.
  78. Attention is all you need. Advances in neural information processing systems. 2017;30.
  79. A survey on clinical natural language processing in the United Kingdom from 2007 to 2022. NPJ digital medicine. 2022;5(1):186.
  80. Survey on natural language processing in medical image analysis. Zhong nan da xue xue bao Yi xue ban= Journal of Central South University Medical Sciences. 2022;47(8):981-93.
  81. ClinicalRadioBERT: Knowledge-Infused Few Shot Learning for Clinical Notes Named Entity Recognition. In: Machine Learning in Medical Imaging: 13th International Workshop, MLMI 2022, Held in Conjunction with MICCAI 2022, Singapore, September 18, 2022, Proceedings. Springer; 2022. p. 269-78.
  82. Chestxraybert: A pretrained language model for chest radiology report summarization. IEEE Transactions on Multimedia. 2021.
  83. Mentalbert: Publicly available pretrained language models for mental healthcare. arXiv preprint arXiv:211015621. 2021.
  84. COVIDSum: A linguistically enriched SciBERT-based summarization model for COVID-19 scientific papers. Journal of Biomedical Informatics. 2022;127:103999.
  85. Large language models are few-shot clinical information extractors. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing; 2022. p. 1998-2022.
  86. ChatGPT & Doctors: The Medical Dream Team. 2023.
  87. An era of ChatGPT as a significant futuristic support tool: A study on features, abilities, and challenges. BenchCouncil Transactions on Benchmarks, Standards and Evaluations. 2023:100089.
  88. Stubbs A, Uzuner Ö. Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus. Journal of biomedical informatics. 2015;58:S20-9.
  89. Is chatgpt a general-purpose natural language processing task solver? arXiv preprint arXiv:230206476. 2023.
  90. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:230209419. 2023.
  91. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:230204023. 2023.
  92. Reynolds L, McDonell K. Prompt programming for large language models: Beyond the few-shot paradigm. In: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems; 2021. p. 1-7.
  93. PromptMaker: Prompt-based Prototyping with Large Language Models. In: CHI Conference on Human Factors in Computing Systems Extended Abstracts; 2022. p. 1-8.
  94. Publicly available clinical BERT embeddings. arXiv preprint arXiv:190403323. 2019.
  95. A primer in BERTology: What we know about how BERT works. Transactions of the Association for Computational Linguistics. 2021;8:842-66.
  96. Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH). 2021;3(1):1-23.
  97. MIMIC-III, a freely accessible critical care database. Scientific data. 2016;3(1):1-9.
  98. A comprehensive survey of deep learning in the field of medical imaging and medical natural language processing: Challenges and research directions. Journal of King Saud University-Computer and Information Sciences. 2022;34(8):5083-99.
  99. Llama: Open and efficient foundation language models. arXiv preprint arXiv:230213971. 2023.
  100. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:221105100. 2022.
  101. Llm. int8 (): 8-bit matrix multiplication for transformers at scale. arXiv preprint arXiv:220807339. 2022.
  102. Zhang M, He Y. Accelerating training of transformer-based language models with progressive layer dropping. Advances in Neural Information Processing Systems. 2020;33:14011-23.
  103. Efficient large-scale language model training on gpu clusters using megatron-lm. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis; 2021. p. 1-15.
  104. AccTFM: An Effective Intra-Layer Model Parallelization Strategy for Training Large-Scale Transformer-Based Models. IEEE Transactions on Parallel and Distributed Systems. 2022;33(12):4326-38.
  105. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234-40.
  106. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics. 2022;23(6).
  107. Agribert: knowledge-infused agricultural language models for matching food and nutrition. IJCAI; 2022. .
  108. Mask-guided BERT for Few Shot Text Classification. arXiv preprint arXiv:230210447. 2023.
  109. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. J Mach Learn Res. 2021;23:1-40.
  110. ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models. arXiv preprint arXiv:230207257. 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (18)
  1. Zhengliang Liu (91 papers)
  2. Yue Huang (171 papers)
  3. Xiaowei Yu (36 papers)
  4. Lu Zhang (373 papers)
  5. Zihao Wu (100 papers)
  6. Chao Cao (104 papers)
  7. Haixing Dai (39 papers)
  8. Lin Zhao (228 papers)
  9. Yiwei Li (107 papers)
  10. Peng Shu (34 papers)
  11. Fang Zeng (10 papers)
  12. Lichao Sun (186 papers)
  13. Wei Liu (1135 papers)
  14. Dinggang Shen (153 papers)
  15. Quanzheng Li (122 papers)
  16. Tianming Liu (161 papers)
  17. Dajiang Zhu (68 papers)
  18. Xiang Li (1003 papers)
Citations (147)
Github Logo Streamline Icon: https://streamlinehq.com