BiMediX: Bilingual Medical Mixture of Experts LLM (2402.13253v2)
Abstract: In this paper, we introduce BiMediX, the first bilingual medical mixture of experts LLM designed for seamless interaction in both English and Arabic. Our model facilitates a wide range of medical interactions in English and Arabic, including multi-turn chats to inquire about additional details such as patient symptoms and medical history, multiple-choice question answering, and open-ended question answering. We propose a semi-automated English-to-Arabic translation pipeline with human refinement to ensure high-quality translations. We also introduce a comprehensive evaluation benchmark for Arabic medical LLMs. Furthermore, we introduce BiMed1.3M, an extensive Arabic-English bilingual instruction set covering 1.3 Million diverse medical interactions, resulting in over 632 million healthcare specialized tokens for instruction tuning. Our BiMed1.3M dataset includes 250k synthesized multi-turn doctor-patient chats and maintains a 1:2 Arabic-to-English ratio. Our model outperforms state-of-the-art Med42 and Meditron by average absolute gains of 2.5% and 4.1%, respectively, computed across multiple medical evaluation benchmarks in English, while operating at 8-times faster inference. Moreover, our BiMediX outperforms the generic Arabic-English bilingual LLM, Jais-30B, by average absolute gains of 10% on our Arabic medical benchmark and 15% on bilingual evaluations across multiple datasets. Our project page with source code and trained model is available at https://github.com/mbzuai-oryx/BiMediX .
- Overview of the medical question answering task at trec 2017 liveqa. In TREC, pages 1–12.
- Bridging the gap between consumers’ medication questions and trusted answers. In MedInfo, pages 25–29.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
- Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073.
- Scibert: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676.
- BioMedLM: a domain-specific large language model for biomedical text. https://crfm.stanford.edu/2022/12/15/biomedlm.html.
- Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174.
- Meditron-70b: Scaling medical pretraining for large language models. arXiv preprint arXiv:2311.16079.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023).
- Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30.
- Med42 - a clinical large language model.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
- Arabart: a pretrained arabic sequence-to-sequence model for abstractive summarization. arXiv preprint arXiv:2203.10945.
- Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH), 3(1):1–23.
- Medalpaca–an open-source collection of medical conversational ai models and training data. arXiv preprint arXiv:2304.08247.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
- Clinicalbert: Modeling clinical notes and predicting hospital readmission. arXiv preprint arXiv:1904.05342.
- Mistral 7b. arXiv preprint arXiv:2310.06825.
- Mixtral of experts. arXiv preprint arXiv:2401.04088.
- What disease does this patient have? a large-scale open domain question answering dataset from medical exams. Applied Sciences, 11(14):6421.
- Pubmedqa: A dataset for biomedical research question answering. arXiv preprint arXiv:1909.06146.
- Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240.
- Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190.
- LLm-analysis. 2023. Llm-analysis - latency and memory analysis of transformer models for training and inference. Availabe at https://github.com/cli99/llm-analysis.
- Biogpt: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6):bbac409.
- Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786.
- Arat5: Text-to-text transformers for arabic language generation. arXiv preprint arXiv:2109.12068.
- Medmcqa: A large-scale multi-subject multi-choice dataset for medical domain question answering. In Conference on Health, Inference, and Learning, pages 248–260. PMLR.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
- Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290.
- Zero: Memory optimizations toward training trillion parameter models. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–16. IEEE.
- Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3505–3506.
- Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
- Jais and jais-chat: Arabic-centric foundation and instruction-tuned open generative large language models. arXiv preprint arXiv:2308.16149.
- Biomegatron: Larger biomedical domain language model. arXiv preprint arXiv:2010.06060.
- Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138.
- Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617.
- Galactica: A large language model for science. arXiv preprint arXiv:2211.09085.
- Clinical camel: An open-source expert-level medical language model with dialogue-based knowledge encoding. arXiv preprint arXiv:2305.12031.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Pmc-llama: Further finetuning llama on medical papers. arXiv preprint arXiv:2304.14454.
- Deep bidirectional language-knowledge graph pretraining. Advances in Neural Information Processing Systems, 35:37309–37323.
- Linkbert: Pretraining language models with document links. arXiv preprint arXiv:2203.15827.
- Chatdoctor: A medical chat model fine-tuned on llama model using medical domain knowledge. arXiv preprint arXiv:2303.14070.
- Sara Pieri (5 papers)
- Sahal Shaji Mullappilly (9 papers)
- Fahad Shahbaz Khan (225 papers)
- Rao Muhammad Anwer (67 papers)
- Salman Khan (244 papers)
- Timothy Baldwin (125 papers)
- Hisham Cholakkal (78 papers)