Exploring the Effectiveness of Instruction Tuning in Biomedical Language Processing (2401.00579v1)
Abstract: LLMs, particularly those similar to ChatGPT, have significantly influenced the field of NLP. While these models excel in general language tasks, their performance in domain-specific downstream tasks such as biomedical and clinical Named Entity Recognition (NER), Relation Extraction (RE), and Medical Natural Language Inference (NLI) is still evolving. In this context, our study investigates the potential of instruction tuning for biomedical language processing, applying this technique to two general LLMs of substantial scale. We present a comprehensive, instruction-based model trained on a dataset that consists of approximately $200,000$ instruction-focused samples. This dataset represents a carefully curated compilation of existing data, meticulously adapted and reformatted to align with the specific requirements of our instruction-based tasks. This initiative represents an important step in utilising such models to achieve results on par with specialised encoder-only models like BioBERT and BioClinicalBERT for various classical biomedical NLP tasks. Our work includes an analysis of the dataset's composition and its impact on model performance, providing insights into the intricacies of instruction tuning. By sharing our codes, models, and the distinctively assembled instruction-based dataset, we seek to encourage ongoing research and development in this area.
- Automatic semantic classification of scientific literature according to the hallmarks of cancer. Bioinformatics, 32(3):432–440.
- Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research. BMC bioinformatics, 16:1–17.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- The future landscape of large language models in medicine. Communications Medicine, 3(1):141.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Ncbi disease corpus: a resource for disease name recognition and concept normalization. Journal of biomedical informatics, 47:1–10.
- Medalpaca–an open-source collection of medical conversational ai models and training data. arXiv preprint arXiv:2304.08247.
- A comprehensive evaluation of large language models on benchmark biomedical text processing tasks. arXiv preprint arXiv:2310.04270.
- What disease does this patient have? a large-scale open domain question answering dataset from medical exams. Applied Sciences, 11(14):6421.
- Pubmedqa: A dataset for biomedical research question answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2567–2577.
- Med7: A transferable clinical natural language processing model for electronic health records. Artificial Intelligence in Medicine, 118:102086.
- Do we still need clinical language models? In Proceedings of the Conference on Health, Inference, and Learning, volume 209 of Proceedings of Machine Learning Research, pages 578–597. PMLR.
- Biocreative v cdr task corpus: a resource for chemical disease relation extraction. Database, 2016.
- Chatdoctor: A medical chat model fine-tuned on a large language model meta-ai (llama) using medical domain knowledge. Cureus, 15(6).
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
- The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116.
- Improving language understanding by generative pre-training.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Lightweight transformers for clinical natural language processing.
- On the effectiveness of compact biomedical transformers. Bioinformatics, 39(3):btad103.
- Alexey Romanov and Chaitanya Shivade. 2018. Lessons from natural language inference in the clinical domain. arXiv:1808.06752 [cs].
- Overview of biocreative ii gene mention recognition. Genome biology, 9(2):1–19.
- Evaluating temporal relations in clinical text: 2012 i2b2 challenge. Journal of the American Medical Informatics Association, 20(5):806–813.
- Clinical camel: An open-source expert-level medical language model with dialogue-based knowledge encoding. arXiv preprint arXiv:2305.12031.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association, 18(5):552–556.
- Attention is all you need. Advances in neural information processing systems, 30.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
- Pmc-llama: Towards building open-source language models for medicine.
- Omid Rohanian (12 papers)
- Mohammadmahdi Nouriborji (8 papers)
- David A. Clifton (54 papers)