Selective Fine-tuning on LLM-labeled Data May Reduce Reliance on Human Annotation: A Case Study Using Schedule-of-Event Table Detection (2405.06093v2)
Abstract: LLMs have demonstrated their efficacy across a broad spectrum of tasks in healthcare applications. However, often LLMs need to be fine-tuned on task-specific expert annotated data to achieve optimal performance, which can be expensive and time consuming. In this study, we fine-tune PaLM-2 with parameter efficient fine-tuning (PEFT) using noisy labels obtained from gemini-pro 1.0 for the detection of Schedule-of-Event (SoE) tables, which specify care plan in clinical trial protocols. We introduce a filtering mechanism to select high-confidence labels for this table classification task, thereby reducing the noise in the auto-generated labels. We show that fine-tuned PaLM-2 with those labels achieves performance that exceeds the gemini-pro 1.0 and other LLMs. Furthermore, its performance is close to a PaLM-2 fine-tuned on labels obtained from non-expert annotators. Our results show that leveraging LLM-generated labels through powerful models like gemini-pro can potentially serve as a viable strategy for improving LLM performance through fine-tuning in specialized tasks, particularly in domains where expert annotations are scarce, expensive, or time-consuming to obtain.
- Palm 2 technical report, 2023.
- This dataset does not exist: training models from generated images. CoRR, abs/1911.02888, 2019. URL http://arxiv.org/abs/1911.02888.
- Camelot Developers. Camelot: PDF Table Extraction for Humans, 2023. URL https://camelot-py.readthedocs.io/.
- Evaluating large language models trained on code, 2021.
- What happens when? interpreting schedule of activity tables in clinical trial documents. In Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, pages 301–306, 2018.
- Trial watch: trends in clinical trial design complexity. Nature Reviews Drug Discovery, 16(5):307–308, 2017.
- Google. Gemini: A family of highly capable multimodal models, 2024.
- Is synthetic data from generative models ready for image recognition?, 2023.
- Digitizing clinical trials. NPJ digital medicine, 3(1):101, 2020.
- Illuminating protein space with a programmable generative model. Nature, 623(7989):1070–1078, 2023.
- The changing landscape of randomized clinical trials in cardiovascular disease. Journal of the American College of Cardiology, 68(17):1898–1907, 2016.
- Synthetic data generation with large language models for text classification: Potential and limitations. arXiv preprint arXiv:2310.07849, 2023.
- Ai-based language models powering drug discovery and development. Drug Discovery Today, 26(11):2593–2607, 2021.
- Technology-enabled clinical trials: transforming medical evidence generation. Circulation, 140(17):1426–1436, 2019.
- Generating training data with language models: Towards zero-shot language understanding. Advances in Neural Information Processing Systems, 35:462–477, 2022.
- OpenAI. Gpt-4 technical report, 2024.
- pdfminer.six Developers. pdfminer.six: PDF parser and analyzer, 2023. URL https://github.com/pdfminer/pdfminer.six.
- Improving language understanding by generative pre-training. 2018.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Zero-shot text-to-image generation, 2021.
- Using digital technologies in clinical trials: Current and future applications. Contemporary clinical trials, 100:106219, 2021.
- Assessment of the clinical knowledge of chatgpt-4 in neonatal-perinatal medicine: a comparative analysis with chatgpt-3.5. Journal of Perinatology, pages 1–2, 2024.
- Could chatgpt-4 pass an anaesthesiology board examination? follow-up assessment of a comprehensive set of board examination practice questions. British Journal of Anaesthesia, 132(1):172–174, 2024.
- Large language models encode clinical knowledge. Nature, 620(7972):172–180, 2023.
- Towards generalist biomedical ai. NEJM AI, 1(3):AIoa2300138, 2024a. 10.1056/AIoa2300138. URL https://ai.nejm.org/doi/abs/10.1056/AIoa2300138.
- Towards conversational diagnostic ai. arXiv preprint arXiv:2401.05654, 2024b.
- Betty van Aken. Exploration and adaptation of large language models for specialized domains. 2023.
- Clinical text summarization: Adapting large language models can outperform human experts. Research Square, 2023.
- Verily Life Sciences. Verily Viewpoint: Site CTMS and Protocol Digitization. Technical report, Q3 2023. URL https://assets.verily.com/m/5a2000e85ed78214/original/Verily-Viewpoint-Site-CTMS-ProtDig_FeatureArticle_Q3-2023-1.pdf.
- Jointly extracting interventions, outcomes, and findings from rct reports with llms. In Machine Learning for Healthcare Conference, pages 754–771. PMLR, 2023.
- GPT3Mix: Leveraging large-scale language models for text augmentation. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors, Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2225–2239, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. 10.18653/v1/2021.findings-emnlp.192. URL https://aclanthology.org/2021.findings-emnlp.192.
- Llm for patient-trial matching: Privacy-aware data augmentation towards better performance and generalizability. In American Medical Informatics Association (AMIA) Annual Symposium, 2023.
- Bhawesh Kumar (6 papers)
- Jonathan Amar (7 papers)
- Eric Yang (13 papers)
- Nan Li (318 papers)
- Yugang Jia (10 papers)