LLMs in Biomedicine: A study on clinical Named Entity Recognition (2404.07376v2)
Abstract: LLMs demonstrate remarkable versatility in various NLP tasks but encounter distinct challenges in biomedical due to the complexities of language and data scarcity. This paper investigates LLMs application in the biomedical domain by exploring strategies to enhance their performance for the NER task. Our study reveals the importance of meticulously designed prompts in the biomedical. Strategic selection of in-context examples yields a marked improvement, offering ~15-20\% increase in F1 score across all benchmark datasets for biomedical few-shot NER. Additionally, our results indicate that integrating external biomedical knowledge via prompting strategies can enhance the proficiency of general-purpose LLMs to meet the specialized needs of biomedical NER. Leveraging a medical knowledge base, our proposed method, DiRAG, inspired by Retrieval-Augmented Generation (RAG), can boost the zero-shot F1 score of LLMs for biomedical NER. Code is released at \url{https://github.com/masoud-monajati/LLM_Bio_NER}
- Publicly available clinical bert embeddings. arXiv preprint arXiv:1904.03323.
- Rie Kubota Ando and Tong Zhang. 2005. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6:1817–1853.
- Galen Andrew and Jianfeng Gao. 2007. Scalable training of L1-regularized log-linear models. In Proceedings of the 24th International Conference on Machine Learning, pages 33–40.
- Olivier Bodenreider. 2004. The unified medical language system (umls): integrating biomedical terminology. Nucleic acids research, 32(suppl_1):D267–D270.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Why can gpt learn in-context? language models secretly perform gradient descent as meta-optimizers. In Findings of the Association for Computational Linguistics: ACL 2023, pages 4005–4019.
- Openprompt: An open-source framework for prompt-learning. arXiv preprint arXiv:2111.01998.
- Ncbi disease corpus: a resource for disease name recognition and concept normalization. Journal of biomedical informatics, 47:1–10.
- A survey for in-context learning. arXiv preprint arXiv:2301.00234.
- Gpt-3.5, gpt-4, or bard? evaluating llms reasoning ability in zero-shot setting and performance boosting through prompts. Natural Language Processing Journal, 5:100032.
- What makes good in-context demonstrations for code intelligence tasks with llms? In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 761–773. IEEE.
- Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821.
- Leveraging a medical knowledge graph into large language models for diagnosis prediction. arXiv preprint arXiv:2308.14321.
- Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997.
- A deep database of medical abbreviations and acronyms for natural language processing. Scientific Data, 8(1):149.
- Distilling large language models for biomedical knowledge extraction: A case study on adverse drug events. arXiv preprint arXiv:2307.06439.
- Don’t stop pretraining: Adapt language models to domains and tasks. In Proceedings of ACL.
- Thinking about gpt-3 in-context learning for biomedical ie? think again. arXiv preprint arXiv:2203.08410.
- Degree: A data-efficient generation-based event extraction model. arXiv preprint arXiv:2108.12724.
- Zero-shot information extraction from radiological reports using chatgpt. International Journal of Medical Informatics, 183:105321.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
- Improving large language models for clinical named entity recognition via prompt engineering. Journal of the American Medical Informatics Association, page ocad259.
- Challenges and applications of large language models. arXiv preprint arXiv:2307.10169.
- Mert Karabacak and Konstantinos Margetis. 2023. Embracing large language models for medical applications: Opportunities and challenges. Cureus, 15(5).
- Large language models in hematology case solving: a comparative study of chatgpt-3.5, google bard, and microsoft bing. Cureus, 15(8).
- Do we still need clinical language models? arXiv preprint arXiv:2302.08091.
- The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
- What makes good in-context examples for gpt-3333? arXiv preprint arXiv:2101.06804.
- Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786.
- Dice: data-efficient clinical event extraction with generative models. arXiv preprint arXiv:2208.07989.
- Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837.
- Gpt-3 models are poor few-shot learners in the biomedical domain. arXiv preprint arXiv:2109.02555.
- Capabilities of gpt-4 on medical challenge problems. arXiv preprint arXiv:2303.13375.
- Can generalist foundation models outcompete special-purpose tuning? case study in medicine. arXiv preprint arXiv:2311.16452.
- Structured prediction as translation between augmented natural languages. arXiv preprint arXiv:2101.05779.
- Mohammad Sadegh Rasooli and Joel R. Tetreault. 2015. Yara parser: A fast and accurate dependency parser. Computing Research Repository, arXiv:1503.06733. Version 2.
- Exploring the effectiveness of instruction tuning in biomedical language processing. arXiv preprint arXiv:2401.00579.
- Overview of biocreative ii gene mention recognition. Genome biology, 9:1–19.
- Mpnet: Masked and permuted pre-training for language understanding. Advances in Neural Information Processing Systems, 33:16857–16867.
- Opportunities and challenges for chatgpt and large language models in biomedicine and health. Briefings in Bioinformatics, 25(1):bbad493.
- 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association, 18(5):552–556.
- Chatgpt: Is this version good for healthcare and research? Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 17(4):102744.
- Chatcad: Interactive computer-aided diagnosis on medical image using large language models. arXiv preprint arXiv:2302.07257.
- Gpt-ner: Named entity recognition via large language models. arXiv preprint arXiv:2304.10428.
- Instructuie: Multi-task instruction tuning for unified information extraction. arXiv preprint arXiv:2304.08085.
- Albert Webson and Ellie Pavlick. 2021. Do prompt-based models really understand the meaning of their prompts? arXiv preprint arXiv:2109.01247.
- A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382.
- Almanac—retrieval-augmented language models for clinical medicine. NEJM AI, 1(2):AIoa2300068.
- What makes good examples for visual in-context learning? arXiv preprint arXiv:2301.13670.
- Universalner: Targeted distillation from large language models for open named entity recognition. arXiv preprint arXiv:2308.03279.
- Masoud Monajatipoor (9 papers)
- Jiaxin Yang (8 papers)
- Joel Stremmel (6 papers)
- Melika Emami (5 papers)
- Fazlolah Mohaghegh (3 papers)
- Mozhdeh Rouhsedaghat (9 papers)
- Kai-Wei Chang (292 papers)