MedLM: Exploring Language Models for Medical Question Answering Systems (2401.11389v2)
Abstract: In the face of rapidly expanding online medical literature, automated systems for aggregating and summarizing information are becoming increasingly crucial for healthcare professionals and patients. LLMs, with their advanced generative capabilities, have shown promise in various NLP tasks, and their potential in the healthcare domain, particularly for Closed-Book Generative QnA, is significant. However, the performance of these models in domain-specific tasks such as medical Q&A remains largely unexplored. This study aims to fill this gap by comparing the performance of general and medical-specific distilled LMs for medical Q&A. We aim to evaluate the effectiveness of fine-tuning domain-specific LMs and compare the performance of different families of LLMs. The study will address critical questions about these models' reliability, comparative performance, and effectiveness in the context of medical Q&A. The findings will provide valuable insights into the suitability of different LMs for specific applications in the medical domain.
- Hugging face transformers - question answering. https://huggingface.co/docs/transformers/tasks/question_answering. Accessed on June 11, 2023.
- Asma Ben Abacha and Dina Demner-Fushman. 2019. A question-entailment approach to question answering. BMC Bioinform., 20(1):511:1–511:23.
- A medical question answering system using large language models and knowledge graphs. International Journal of Intelligent Systems, 37(11):8548–8564.
- Clinicalbert: Modeling clinical notes and predicting hospital readmission.
- Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240.
- Rensis Likert. 1932. A technique for the measurement of attitudes. Archives of psychology.
- Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
- Can large language models reason about medical questions?
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
- Language models are unsupervised multitask learners.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
- Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. npj Digital Medicine, 4(1).
- Lasse Regin. 2017. Medical question answer data. https://github.com/LasseRegin/medical-question-answer-data. Accessed: May 15, 2023.
- Malik Sallam. 2023. The utility of chatgpt as an example of large language models in healthcare education, research and practice: Systematic review on the future perspectives and potential limitations. medRxiv, pages 2023–02.
- Large language models encode clinical knowledge.
- One embedder, any task: Instruction-finetuned text embeddings. arXiv preprint arXiv:2212.09741.
- A survey of large language models. arXiv preprint arXiv:2303.18223.
- Niraj Yagnik (2 papers)
- Jay Jhaveri (1 paper)
- Vivek Sharma (54 papers)
- Gabriel Pila (1 paper)