MEDITRON-70B: Scaling Medical Pretraining for Large Language Models (2311.16079v1)

Published 27 Nov 2023 in cs.CL, cs.AI, and cs.LG

Abstract: LLMs can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by releasing MEDITRON: a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain. MEDITRON builds on Llama-2 (through our adaptation of Nvidia's Megatron-LM distributed trainer), and extends pretraining on a comprehensively curated medical corpus, including selected PubMed articles, abstracts, and internationally-recognized medical guidelines. Evaluations using four major medical benchmarks show significant performance gains over several state-of-the-art baselines before and after task-specific finetuning. Overall, MEDITRON achieves a 6% absolute performance gain over the best public baseline in its parameter class and 3% over the strongest baseline we finetuned from Llama-2. Compared to closed-source LLMs, MEDITRON-70B outperforms GPT-3.5 and Med-PaLM and is within 5% of GPT-4 and 10% of Med-PaLM-2. We release our code for curating the medical pretraining corpus and the MEDITRON model weights to drive open-source development of more capable medical LLMs.

PDF Abstract

In recent developments within the arena of medical knowledge and AI, there has been a notable release in the form of EDITRON, a suite of LLMs specifically tuned for tasks in the medical domain. EDITRON comes in two variants, with 7 billion and 70 billion parameters, and embodies a significant contribution to the field as they are adaptively pretrained on a substantial corpus of high-quality medical literature, including articles from PubMed and curated clinical practice guidelines.

The inception of EDITRON lies in the recognition of the transformative potential that LLMs hold for the democratization of medical knowledge. Given this backdrop, EDITRON's development sought to harness the emergent properties of LLMs, like coherent communication and contextual interpretation, tailored to the medical sector. EDITRON not only builds upon the foundations of Llama-2 but also extends its application through a tailored and comprehensive medical training dataset. As a result, EDITRON demonstrates remarkable performance improvements over several state-of-the-art baselines in medical reasoning benchmarks.

Assessments of EDITRON's capabilities have been conducted using established medical benchmarks, with tests indicating a notable proficiency in answering complex, US Medical Licensing Examination (USMLE) style queries and an overall enhancement in performance over its predecessor, Llama-2. Moreover, EDITRON's performance is particularly commendable given its open-source nature, allowing it to challenge even proprietary models with substantially higher parameter counts.

It's important to highlight, however, that EDITRON is not advised for deployment in medical applications without additional alignment and testing. Such precautions underscore the model's current stage as a cutting-edge tool designed for further development and research rather than direct clinical application.

The creators of EDITRON have set a compelling precedent by publicly releasing the model weights alongside the detailed corpus and distributed training library utilized during development. This move is not only a significant enabler for further advancements but also a gesture fostering transparency and collaborative growth across the AI and medical research communities.

In conclusion, EDITRON stands as a beacon of innovation and a valuable resource that may pave the way for future research and the enhancement of AI capabilities in the field of medical decision-making and evidence-based medicine. Although challenges remain, particularly in the context of ethical deployment and safety, the path forward shimmers with the promise of a more informed and equitable medical paradigm powered by AI.