Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLMs Accelerate Annotation for Medical Information Extraction (2312.02296v1)

Published 4 Dec 2023 in cs.CL, cs.AI, and cs.LG

Abstract: The unstructured nature of clinical notes within electronic health records often conceals vital patient-related information, making it challenging to access or interpret. To uncover this hidden information, specialized NLP models are required. However, training these models necessitates large amounts of labeled data, a process that is both time-consuming and costly when relying solely on human experts for annotation. In this paper, we propose an approach that combines LLMs with human expertise to create an efficient method for generating ground truth labels for medical text annotation. By utilizing LLMs in conjunction with human annotators, we significantly reduce the human annotation burden, enabling the rapid creation of labeled datasets. We rigorously evaluate our method on a medical information extraction task, demonstrating that our approach not only substantially cuts down on human intervention but also maintains high accuracy. The results highlight the potential of using LLMs to improve the utilization of unstructured clinical data, allowing for the swift deployment of tailored NLP solutions in healthcare.

In the field of healthcare, the efficient extraction of information from unstructured clinical data, such as electronic health records (EHRs), plays a pivotal role in improving patient care. Specialized NLP models are deployed for such tasks; however, developing these models requires a significant amount of accurately labeled data, which is both resource-intensive and cost-prohibitive due to the need for manual annotation by medical experts.

This paper introduces an innovative method that leverages LLMs in combination with human expertise to expedite the annotation process for medical text data while maintaining high levels of accuracy. By pairing LLMs with human annotators, the process of creating labeled datasets is significantly accelerated, thus reducing the burden on human experts.

The research evaluates the effectiveness of this method through a medical information extraction task that centers on the identification and association of medication information within clinical text. Two phases of annotation are considered: Base Annotation, where initial labels are applied, and Refinement Annotation, where these labels are adjusted to ensure accuracy. They compare the performance of the LLM-assisted annotation workflow with a fully manual process using expert human annotation and refinement.

The evaluation of the findings focused on medication extraction task demonstrates that the LLM-based annotation achieves similar quality levels to the manual process while reducing annotation time by 58%. Subgroup analysis highlights the utility of LLMs even for expert annotators, with time savings of 26%, suggesting that even highly skilled annotators benefit from LLM assistance.

Moreover, the paper contributes a set of labels for medication extraction using a public medical dataset (MIMIC-IV-Note) and discusses future directions for integrating LLMs to enhance medical NLP.

In summary, this research presents a compelling use case for incorporating LLMs into the data annotation workflow, a method that shows promise in overcoming the labeling bottleneck in medical NLP tasks. As LLMs continue to improve, they may profoundly impact the field by providing efficient ways to organize and access critical information locked within unstructured clinical data.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (13)
  1. Akshay Goel (4 papers)
  2. Almog Gueta (4 papers)
  3. Omry Gilon (1 paper)
  4. Chang Liu (863 papers)
  5. Sofia Erell (1 paper)
  6. Lan Huong Nguyen (1 paper)
  7. Xiaohong Hao (2 papers)
  8. Bolous Jaber (2 papers)
  9. Shashir Reddy (2 papers)
  10. Rupesh Kartha (1 paper)
  11. Jean Steiner (4 papers)
  12. Itay Laish (6 papers)
  13. Amir Feder (25 papers)
Citations (72)
X Twitter Logo Streamline Icon: https://streamlinehq.com