Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Selective Fine-tuning on LLM-labeled Data May Reduce Reliance on Human Annotation: A Case Study Using Schedule-of-Event Table Detection (2405.06093v2)

Published 9 May 2024 in cs.LG and cs.CL

Abstract: LLMs have demonstrated their efficacy across a broad spectrum of tasks in healthcare applications. However, often LLMs need to be fine-tuned on task-specific expert annotated data to achieve optimal performance, which can be expensive and time consuming. In this study, we fine-tune PaLM-2 with parameter efficient fine-tuning (PEFT) using noisy labels obtained from gemini-pro 1.0 for the detection of Schedule-of-Event (SoE) tables, which specify care plan in clinical trial protocols. We introduce a filtering mechanism to select high-confidence labels for this table classification task, thereby reducing the noise in the auto-generated labels. We show that fine-tuned PaLM-2 with those labels achieves performance that exceeds the gemini-pro 1.0 and other LLMs. Furthermore, its performance is close to a PaLM-2 fine-tuned on labels obtained from non-expert annotators. Our results show that leveraging LLM-generated labels through powerful models like gemini-pro can potentially serve as a viable strategy for improving LLM performance through fine-tuning in specialized tasks, particularly in domains where expert annotations are scarce, expensive, or time-consuming to obtain.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Palm 2 technical report, 2023.
  2. This dataset does not exist: training models from generated images. CoRR, abs/1911.02888, 2019. URL http://arxiv.org/abs/1911.02888.
  3. Camelot Developers. Camelot: PDF Table Extraction for Humans, 2023. URL https://camelot-py.readthedocs.io/.
  4. Evaluating large language models trained on code, 2021.
  5. What happens when? interpreting schedule of activity tables in clinical trial documents. In Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, pages 301–306, 2018.
  6. Trial watch: trends in clinical trial design complexity. Nature Reviews Drug Discovery, 16(5):307–308, 2017.
  7. Google. Gemini: A family of highly capable multimodal models, 2024.
  8. Is synthetic data from generative models ready for image recognition?, 2023.
  9. Digitizing clinical trials. NPJ digital medicine, 3(1):101, 2020.
  10. Illuminating protein space with a programmable generative model. Nature, 623(7989):1070–1078, 2023.
  11. The changing landscape of randomized clinical trials in cardiovascular disease. Journal of the American College of Cardiology, 68(17):1898–1907, 2016.
  12. Synthetic data generation with large language models for text classification: Potential and limitations. arXiv preprint arXiv:2310.07849, 2023.
  13. Ai-based language models powering drug discovery and development. Drug Discovery Today, 26(11):2593–2607, 2021.
  14. Technology-enabled clinical trials: transforming medical evidence generation. Circulation, 140(17):1426–1436, 2019.
  15. Generating training data with language models: Towards zero-shot language understanding. Advances in Neural Information Processing Systems, 35:462–477, 2022.
  16. OpenAI. Gpt-4 technical report, 2024.
  17. pdfminer.six Developers. pdfminer.six: PDF parser and analyzer, 2023. URL https://github.com/pdfminer/pdfminer.six.
  18. Improving language understanding by generative pre-training. 2018.
  19. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  20. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  21. Zero-shot text-to-image generation, 2021.
  22. Using digital technologies in clinical trials: Current and future applications. Contemporary clinical trials, 100:106219, 2021.
  23. Assessment of the clinical knowledge of chatgpt-4 in neonatal-perinatal medicine: a comparative analysis with chatgpt-3.5. Journal of Perinatology, pages 1–2, 2024.
  24. Could chatgpt-4 pass an anaesthesiology board examination? follow-up assessment of a comprehensive set of board examination practice questions. British Journal of Anaesthesia, 132(1):172–174, 2024.
  25. Large language models encode clinical knowledge. Nature, 620(7972):172–180, 2023.
  26. Towards generalist biomedical ai. NEJM AI, 1(3):AIoa2300138, 2024a. 10.1056/AIoa2300138. URL https://ai.nejm.org/doi/abs/10.1056/AIoa2300138.
  27. Towards conversational diagnostic ai. arXiv preprint arXiv:2401.05654, 2024b.
  28. Betty van Aken. Exploration and adaptation of large language models for specialized domains. 2023.
  29. Clinical text summarization: Adapting large language models can outperform human experts. Research Square, 2023.
  30. Verily Life Sciences. Verily Viewpoint: Site CTMS and Protocol Digitization. Technical report, Q3 2023. URL https://assets.verily.com/m/5a2000e85ed78214/original/Verily-Viewpoint-Site-CTMS-ProtDig_FeatureArticle_Q3-2023-1.pdf.
  31. Jointly extracting interventions, outcomes, and findings from rct reports with llms. In Machine Learning for Healthcare Conference, pages 754–771. PMLR, 2023.
  32. GPT3Mix: Leveraging large-scale language models for text augmentation. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors, Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2225–2239, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. 10.18653/v1/2021.findings-emnlp.192. URL https://aclanthology.org/2021.findings-emnlp.192.
  33. Llm for patient-trial matching: Privacy-aware data augmentation towards better performance and generalizability. In American Medical Informatics Association (AMIA) Annual Symposium, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Bhawesh Kumar (6 papers)
  2. Jonathan Amar (7 papers)
  3. Eric Yang (13 papers)
  4. Nan Li (318 papers)
  5. Yugang Jia (10 papers)
Citations (2)