Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

How Important is Domain Specificity in Language Models and Instruction Finetuning for Biomedical Relation Extraction? (2402.13470v1)

Published 21 Feb 2024 in cs.CL
How Important is Domain Specificity in Language Models and Instruction Finetuning for Biomedical Relation Extraction?

Abstract: Cutting edge techniques developed in the general NLP domain are often subsequently applied to the high-value, data-rich biomedical domain. The past few years have seen generative LLMs (LMs), instruction finetuning, and few-shot learning become foci of NLP research. As such, generative LMs pretrained on biomedical corpora have proliferated and biomedical instruction finetuning has been attempted as well, all with the hope that domain specificity improves performance on downstream tasks. Given the nontrivial effort in training such models, we investigate what, if any, benefits they have in the key biomedical NLP task of relation extraction. Specifically, we address two questions: (1) Do LMs trained on biomedical corpora outperform those trained on general domain corpora? (2) Do models instruction finetuned on biomedical datasets outperform those finetuned on assorted datasets or those simply pretrained? We tackle these questions using existing LMs, testing across four datasets. In a surprising result, general-domain models typically outperformed biomedical-domain models. However, biomedical instruction finetuning improved performance to a similar degree as general instruction finetuning, despite having orders of magnitude fewer instructions. Our findings suggest it may be more fruitful to focus research effort on larger-scale biomedical instruction finetuning of general LMs over building domain-specific biomedical LMs

Evaluating the Impact of Domain Specificity in LLMs for Biomedical Relation Extraction

Introduction to the Study

The intersection of generative LLMs (LMs) and the biomedical domain represents a fertile ground for enhancing tasks such as relation extraction (RE), a pivotal component in biomedical knowledge discovery. In an effort to investigate the necessity and effectiveness of domain-specificity in LMs and instruction finetuning (IFT) for biomedical RE, this paper explores two pivotal questions. Firstly, it assesses whether LMs pretrained on biomedical corpora exhibit superior performance over those trained on general-domain corpora. Secondly, it examines how models that have undergone IFT on biomedical datasets fare against those fine-tuned on more diverse datasets or those that have merely been pretrained. These inquiries are pursued through the lens of several existing LMs and tested across four biomedical RE datasets.

Biomedical Relation Extraction and LLMs

Relation extraction involves identifying semantic relationships between entities within a text, a process critical for constructing knowledge graphs and supporting various biomedical applications. Traditionally, RE and Named Entity Recognition (NER) tasks were accomplished using encoder models; however, generative models have shown promise in handling these tasks more flexibly through natural language prompts, particularly in few-shot learning scenarios. Concurrently, instruction finetuning has emerged as a method to align generative LMs towards specific task objectives, potentially enhancing their performance across various datasets.

Investigation and Methodology

The paper harnessed a selection of biomedical and general-domain LMs, including but not limited to variants of BART, T5, GPT-2, and BioGPT, alongside instruction-finetuned models like Flan-T5 and In-BoXBART. These models were evaluated in both full finetuning and few-shot settings across datasets such as CDR and ChemProt, which encompass diverse biomedical relations. Conversion of RE instances into natural language sequences facilitated the finetuning of generative LMs for the RE task at hand.

Key Findings

Surprisingly, the investigation revealed that general-domain models typically outperformed their biomedical-domain counterparts across most datasets and settings. However, models that underwent biomedical IFT showed performance improvements comparable to those achieved through general domain IFT, despite significantly fewer instructions. These findings prompt a reconsideration of the prevailing assumption that domain-specific pretraining universally yields better models for specialized tasks like biomedical RE.

Theoretical and Practical Implications

The results suggest that the advantages of domain-specific pretraining for RE tasks might be outweighed by the benefits derived from the broader, more diverse linguistic representations captured by general-domain LMs. Notably, the effective application of IFT, even with a limited set of biomedical instructions, underscores the potential of tailored model tuning over the development of domain-specific models from scratch. These insights advocate for a strategic pivot towards leveraging and refining existing general-domain LMs through targeted instruction finetuning, optimizing the balance between model performance and the resource-intensive process of model development.

Future Directions

This research opens avenues for further exploration beyond biomedical RE, encouraging the examination of domain specificity and IFT's impact across different fields and tasks. Moreover, expanding the scale and scope of biomedical IFT, potentially harnessing larger biomedical metadatasets, could unearth further enhancements in model performance. While the findings predominantly pertain to RE tasks, their implications could inform broader strategies in AI application development within and beyond the biomedical domain.

Conclusion

The nuanced approach of this paper, exploring the intricate dynamics between domain-specific pretraining, IFT, and RE performance, provides a foundational understanding for future AI research and development strategies. As the field evolves, continuous reassessment of these methodologies will be essential in harnessing the full potential of LMs across diverse knowledge domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Publicly available clinical BERT embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, pages 72–78, Minneapolis, Minnesota, USA. Association for Computational Linguistics.
  2. Exploiting semantic patterns over biomedical knowledge graphs for predicting treatment and causative relations. Journal of biomedical informatics, 82:189–199.
  3. GPT-Neo: Large scale autoregressive language modeling with mesh-tensorflow.
  4. BioMedLM. https://crfm.stanford.edu/2022/12/15/biomedlm.html. Accessed: 2023-08-18.
  5. Vaccine adverse event text mining system for extracting features from vaccine safety reports. Journal of the American Medical Informatics Association, 19(6):1011–1018.
  6. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  7. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30.
  8. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  9. ELECTRA: pre-training text encoders as discriminators rather than generators. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
  10. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  11. FuzzyBIO: A proposal for fuzzy representation of discontinuous entities. In Proceedings of the 12th International Workshop on Health Text Mining and Information Analysis, pages 77–82, online. Association for Computational Linguistics.
  12. Markus Eberts and Adrian Ulges. 2020. Span-based joint entity and relation extraction with transformer pre-training. Proceedings of the 24th European Conference on Artificial Intelligence (ECAI 2020), page 2006–2013.
  13. Markus Eberts and Adrian Ulges. 2021. An end-to-end model for entity-level relation extraction using multi-instance learning. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 3650–3660, Online. Association for Computational Linguistics.
  14. Bigbio: A framework for data-centric biomedical natural language processing. arXiv preprint arXiv:2206.15076.
  15. A sequence-to-sequence approach for document-level relation extraction. In Proceedings of the 21st Workshop on Biomedical Language Processing, pages 10–25, Dublin, Ireland. Association for Computational Linguistics.
  16. Domain-specific language model pretraining for biomedical natural language processing.
  17. Don’t stop pretraining: Adapt language models to domains and tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8342–8360, Online. Association for Computational Linguistics.
  18. The ddi corpus: An annotated corpus with pharmacological substances and drug–drug interactions. Journal of biomedical informatics, 46(5):914–920.
  19. Discovering drug-target interaction knowledge from biomedical literature.
  20. Discovering drug–target interaction knowledge from biomedical literature. Bioinformatics, 38(22):5100–5107.
  21. Pere-Lluís Huguet Cabot and Roberto Navigli. 2021. REBEL: Relation extraction by end-to-end language generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2370–2381, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  22. Yuhang Jiang and Ramakanth Kavuluru. 2023. End-to-end n𝑛nitalic_n-ary relation extraction for combination drug therapies.
  23. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9.
  24. BioELECTRA:pretrained biomedical text encoder using discriminators. In Proceedings of the 20th Workshop on Biomedical Language Processing, pages 143–154, Online. Association for Computational Linguistics.
  25. Overview of the biocreative vi chemical-protein interaction track. In Proceedings of the sixth BioCreative challenge evaluation workshop, volume 1, pages 141–146.
  26. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240.
  27. A collaborative filtering-based approach to biomedical knowledge discovery. Bioinformatics, 34(4):652–659.
  28. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. CoRR, abs/1910.13461.
  29. Pretrained language models for biomedical and clinical tasks: Understanding and extending the state-of-the-art. In Proceedings of the 3rd Clinical Natural Language Processing Workshop, pages 146–157, Online. Association for Computational Linguistics.
  30. Biocreative V CDR task corpus: a resource for chemical disease relation extraction. Database J. Biol. Databases Curation, 2016.
  31. Towards drug safety surveillance and pharmacovigilance: current progress in detecting medication and adverse drug events from electronic health records. Drug safety, 42(1):95–97.
  32. Ro{bert}a: A robustly optimized {bert} pretraining approach.
  33. S2ORC: The semantic scholar open research corpus. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4969–4983, Online. Association for Computational Linguistics.
  34. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6). Bbac409.
  35. Makoto Miwa and Yutaka Sasaki. 2014. Modeling joint entity and relation extraction with table representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1858–1869.
  36. Tapas Nayak and Hwee Tou Ng. 2020. Effective modeling of encoder-decoder architecture for joint entity and relation extraction. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pages 8528–8535. AAAI Press.
  37. Training language models to follow instructions with human feedback.
  38. In-BoXBART: Get instructions into biomedical multi-task learning. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 112–128, Seattle, United States. Association for Computational Linguistics.
  39. Transfer learning in biomedical natural language processing: An evaluation of bert and elmo on ten benchmarking datasets. In Proceedings of the 2019 Workshop on Biomedical Natural Language Processing (BioNLP 2019), pages 58–65.
  40. Scifive: a text-to-text transformer model for biomedical literature.
  41. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  42. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
  43. Multitask prompted training enables zero-shot task generalization.
  44. Timo Schick and Hinrich Schütze. 2021. Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 255–269, Online. Association for Computational Linguistics.
  45. BioMegatron: Larger biomedical domain language model. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4700–4706, Online. Association for Computational Linguistics.
  46. Megatron-lm: Training multi-billion parameter language models using model parallelism.
  47. Learning to summarize with human feedback. In Advances in Neural Information Processing Systems, volume 33, pages 3008–3021. Curran Associates, Inc.
  48. A dataset for n-ary relation extraction of drug combinations. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3190–3203, Seattle, United States. Association for Computational Linguistics.
  49. Llama 2: Open foundation and fine-tuned chat models.
  50. Revisiting relation extraction in the era of large language models.
  51. Automatic multi-label prompting: Simple and interpretable few-shot classification. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5483–5492, Seattle, United States. Association for Computational Linguistics.
  52. Self-instruct: Aligning language models with self-generated instructions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13484–13508, Toronto, Canada. Association for Computational Linguistics.
  53. Discontinuous named entity recognition as maximal clique discovery. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 764–774, Online. Association for Computational Linguistics.
  54. Finetuned language models are zero-shot learners. In International Conference on Learning Representations.
  55. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
  56. BioBART: Pretraining and evaluation of a biomedical generative language model. In Proceedings of the 21st Workshop on Biomedical Language Processing, pages 97–109, Dublin, Ireland. Association for Computational Linguistics.
  57. Copymtl: Copy mechanism for joint extraction of entities and relations with multi-task learning.
  58. Extracting relational facts by an end-to-end neural model with copy mechanism. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 506–514, Melbourne, Australia. Association for Computational Linguistics.
  59. Drug repurposing for covid-19 via knowledge graph completion. Journal of biomedical informatics, 115:103696.
  60. Zexuan Zhong and Danqi Chen. 2021. A frustratingly easy approach for entity and relation extraction. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 50–61, Online. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Aviv Brokman (3 papers)
  2. Ramakanth Kavuluru (23 papers)
Citations (2)
X Twitter Logo Streamline Icon: https://streamlinehq.com