Selective Fine-tuning on LLM-labeled Data May Reduce Reliance on Human Annotation: A Case Study Using Schedule-of-Event Table Detection (2405.06093v2)

Published 9 May 2024 in cs.LG and cs.CL

Abstract: LLMs have demonstrated their efficacy across a broad spectrum of tasks in healthcare applications. However, often LLMs need to be fine-tuned on task-specific expert annotated data to achieve optimal performance, which can be expensive and time consuming. In this study, we fine-tune PaLM-2 with parameter efficient fine-tuning (PEFT) using noisy labels obtained from gemini-pro 1.0 for the detection of Schedule-of-Event (SoE) tables, which specify care plan in clinical trial protocols. We introduce a filtering mechanism to select high-confidence labels for this table classification task, thereby reducing the noise in the auto-generated labels. We show that fine-tuned PaLM-2 with those labels achieves performance that exceeds the gemini-pro 1.0 and other LLMs. Furthermore, its performance is close to a PaLM-2 fine-tuned on labels obtained from non-expert annotators. Our results show that leveraging LLM-generated labels through powerful models like gemini-pro can potentially serve as a viable strategy for improving LLM performance through fine-tuning in specialized tasks, particularly in domains where expert annotations are scarce, expensive, or time-consuming to obtain.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (33)

Authors (5)

Bhawesh Kumar (6 papers)
Jonathan Amar (7 papers)
Eric Yang (13 papers)
Nan Li (318 papers)
Yugang Jia (10 papers)

Citations (2)

View on Semantic Scholar

Tweets

https://twitter.com/bhaweshiitk/status/1822331338554278196

https://twitter.com/Verily/status/1821999874117898629

Selective Fine-tuning on LLM-labeled Data May Reduce Reliance on Human Annotation: A Case Study Using Schedule-of-Event Table Detection (2405.06093v2)

Related Papers

Tweets