Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MedLM: Exploring Language Models for Medical Question Answering Systems (2401.11389v2)

Published 21 Jan 2024 in cs.CL, cs.AI, and cs.LG

Abstract: In the face of rapidly expanding online medical literature, automated systems for aggregating and summarizing information are becoming increasingly crucial for healthcare professionals and patients. LLMs, with their advanced generative capabilities, have shown promise in various NLP tasks, and their potential in the healthcare domain, particularly for Closed-Book Generative QnA, is significant. However, the performance of these models in domain-specific tasks such as medical Q&A remains largely unexplored. This study aims to fill this gap by comparing the performance of general and medical-specific distilled LMs for medical Q&A. We aim to evaluate the effectiveness of fine-tuning domain-specific LMs and compare the performance of different families of LLMs. The study will address critical questions about these models' reliability, comparative performance, and effectiveness in the context of medical Q&A. The findings will provide valuable insights into the suitability of different LMs for specific applications in the medical domain.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. Hugging face transformers - question answering. https://huggingface.co/docs/transformers/tasks/question_answering. Accessed on June 11, 2023.
  2. Asma Ben Abacha and Dina Demner-Fushman. 2019. A question-entailment approach to question answering. BMC Bioinform., 20(1):511:1–511:23.
  3. A medical question answering system using large language models and knowledge graphs. International Journal of Intelligent Systems, 37(11):8548–8564.
  4. Clinicalbert: Modeling clinical notes and predicting hospital readmission.
  5. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240.
  6. Rensis Likert. 1932. A technique for the measurement of attitudes. Archives of psychology.
  7. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
  8. Can large language models reason about medical questions?
  9. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
  10. Language models are unsupervised multitask learners.
  11. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
  12. Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. npj Digital Medicine, 4(1).
  13. Lasse Regin. 2017. Medical question answer data. https://github.com/LasseRegin/medical-question-answer-data. Accessed: May 15, 2023.
  14. Malik Sallam. 2023. The utility of chatgpt as an example of large language models in healthcare education, research and practice: Systematic review on the future perspectives and potential limitations. medRxiv, pages 2023–02.
  15. Large language models encode clinical knowledge.
  16. One embedder, any task: Instruction-finetuned text embeddings. arXiv preprint arXiv:2212.09741.
  17. A survey of large language models. arXiv preprint arXiv:2303.18223.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Niraj Yagnik (2 papers)
  2. Jay Jhaveri (1 paper)
  3. Vivek Sharma (54 papers)
  4. Gabriel Pila (1 paper)
Citations (1)