2000 character limit reached
BioBERT-based Deep Learning and Merged ChemProt-DrugProt for Enhanced Biomedical Relation Extraction (2405.18605v1)
Published 28 May 2024 in cs.CL, cs.IR, and q-bio.MN
Abstract: This paper presents a methodology for enhancing relation extraction from biomedical texts, focusing specifically on chemical-gene interactions. Leveraging the BioBERT model and a multi-layer fully connected network architecture, our approach integrates the ChemProt and DrugProt datasets using a novel merging strategy. Through extensive experimentation, we demonstrate significant performance improvements, particularly in CPR groups shared between the datasets. The findings underscore the importance of dataset merging in augmenting sample counts and improving model accuracy. Moreover, the study highlights the potential of automated information extraction in biomedical research and clinical practice.
- An extended overview of the clef 2020 chemu lab. In the Conference and Labs of the Evaluation Forum (CLEF). 22-25 September 2020, 2020.
- An end-to-end hybrid algorithm for automated medication discrepancy detection. BMC Medical Informatics and Decision Making, 15(1):37, 2015.
- Extraction of reactions from patents using grammars. In Central Europe Workshop Proceedings (CEUR-WS), 2020.
- Chemicaltagger: A tool for semantic text-mining in chemistry. Journal of Cheminformatics, 3(1):17, 2011.
- I2b2 challenges in clinical natural language processing 2010. In Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data, 2010.
- Caramba: concept, assertion, and relation annotation using machine-learning based approaches. In Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data, 2010.
- Nlm’s system description for the fourth i2b2/va challenge. In Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data, 2010.
- Concept, assertion and relation extraction at the 2010 i2b2 relation extraction challenge using parsing information and dictionaries. Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data, 2010.
- Extracting adverse drug event information with minimal engineering. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, pages 22–27, 2019.
- I2b2 2010 challenge: machine learning for information extraction from patient records. In Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data. Boston, MA, USA: i2b2, 2010.
- Alan R Aronson. Effective mapping of biomedical text to the umls metathesaurus: the metamap program. In Proceedings of the AMIA Symposium, page 17. American Medical Informatics Association, 2001.
- Detecting concept relations in clinical text: Insights from a state-of-the-art model. Journal of Biomedical Informatics, 46(2):275–285, 2013.
- Hybrid methods for improving information access in clinical documents: concept, assertion, and relation identification. Journal of the American Medical Informatics Association, 18(5):588–593, 2011.
- Identifying relations of medications with adverse drug events using recurrent convolutional neural networks and gradient boosting. Journal of the American Medical Informatics Association, 27(1):65–72, 2020.
- Jerome H Friedman. Stochastic gradient boosting. Computational statistics & data analysis, 38(4):367–378, 2002.
- Relation extraction from clinical texts using domain invariant convolutional neural network. arXiv preprint arXiv:1606.09370, 2016.
- Segment convolutional neural networks (seg-cnns) for classifying relations in clinical notes. Journal of the American Medical Informatics Association, 25(1):93–98, 2017.
- Clinical relation extraction with deep learning. International Journal of Hybrid Information Technology, 9(7):237–248, 2016.
- Convolutional lstm network with hierarchical attention for relation classification in clinical texts. In 2019 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2019.
- A study of deep learning approaches for medication and adverse drug event extraction from clinical text. Journal of the American Medical Informatics Association, 27(1):13–21, 2020.
- High accuracy information extraction of medication information from clinical notes: 2009 i2b2 medication extraction challenge. Journal of the American Medical Informatics Association, 17(5):524–527, 2010.
- Multiple features for clinical relation extraction: A machine learning approach. Journal of Biomedical Informatics, 103:103382, 2020.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 2020.
- Publicly available clinical bert embeddings. arXiv preprint arXiv:1904.03323, 2019.
- Named entity recognition in chemical patents using ensemble of contextual language models. arXiv preprint arXiv:2007.12569, 2020.
- Biocreative vii-track 1: a bert-based system for relation extraction in biomedical text. In BioCreative VII Workshop, 2021.
- Jingqi Wang1 Yuankai Ren2 Zhi Zhang and Yaoyun Zhang. Melaxtech: A report for clef 2020–chemu task of chemical reaction extraction from patent. 2020.
- Extracting adverse drug events from clinical notes. AMIA Summits on Translational Science Proceedings, 2021:420, 2021.
- Graph convolutional networks for chemical relation extraction. In Companion Proceedings of the Web Conference 2022, pages 833–842, 2022.
- Llm instruction-example adaptive prompting (leap) framework for clinical relation extraction. medRxiv, pages 2023–12, 2023.
- Biomedical relation extraction with knowledge base–refined weak supervision. Database, 2023:baad054, 2023.
- Improving biomedical pretrained language models with knowledge. arXiv preprint arXiv:2104.10344, 2021.
- Fine-tuning large neural language models for biomedical natural language processing. Patterns, 4(4), 2023.
- Relation extraction in underexplored biomedical domains: A diversity-optimised sampling and synthetic data generation approach. arXiv preprint arXiv:2311.06364, 2023.
- Structured information extraction from complex scientific text with fine-tuned large language models. arXiv preprint arXiv:2212.05238, 2022.
- Hierarchical recurrent convolutional neural network for chemical-protein relation extraction from biomedical literature. In 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 765–766. IEEE, 2018.
- Bridget T. McInnes (5 papers)
- Jiawei Tang (16 papers)
- Darshini Mahendran (2 papers)
- Mai H. Nguyen (8 papers)