Me LLaMA: Foundation Large Language Models for Medical Applications (2402.12749v5)
Abstract: Recent advancements in LLMs like ChatGPT and LLaMA show promise in medical applications, yet challenges remain in medical language comprehension. This study presents Me-LLaMA, a new medical LLM family based on open-source LLaMA models, optimized for medical text analysis and diagnosis by leveraging large-scale, domain-specific datasets. The Me-LLaMA family, including foundation models Me-LLaMA 13/70B and their chat-enhanced versions, was developed through continued pre-training and instruction tuning with 129B tokens and 214K samples from biomedical and clinical sources. Training the 70B models required over 100,000 A100 GPU hours. Me-LLaMA's performance was evaluated across six medical text analysis tasks using 12 benchmark datasets and complex clinical case diagnosis, with automatic and human evaluations. Results indicate Me-LLaMA outperforms LLaMA and other open-source medical LLMs in zero-shot and supervised settings. Task-specific tuning further boosts performance, surpassing ChatGPT on 7 of 8 datasets and GPT-4 on 5 of 8. For complex clinical cases, Me-LLaMA achieves performance comparable to ChatGPT and GPT-4. This work underscores the importance of domain-specific data in developing medical LLMs and addresses the high computational costs involved in training, highlighting a balance between pre-training and fine-tuning strategies. Me-LLaMA models are now accessible under user agreements, providing a valuable resource for advancing medical AI.
- Qianqian Xie (60 papers)
- Qingyu Chen (57 papers)
- Aokun Chen (12 papers)
- Cheng Peng (177 papers)
- Yan Hu (75 papers)
- Fongci Lin (3 papers)
- Xueqing Peng (12 papers)
- Jimin Huang (37 papers)
- Jeffrey Zhang (26 papers)
- Vipina Keloth (1 paper)
- Huan He (45 papers)
- Yonghui Wu (115 papers)
- Hua Xu (78 papers)
- Jiang Bian (229 papers)
- Xinyu Zhou (82 papers)
- Lucila Ohno-Machado (12 papers)
- Lingfei Qian (10 papers)
- Dennis Shung (13 papers)