DMNER: Biomedical Entity Recognition by Detection and Matching (2306.15736v2)
Abstract: Biomedical named entity recognition (BNER) serves as the foundation for numerous biomedical text mining tasks. Unlike general NER, BNER require a comprehensive grasp of the domain, and incorporating external knowledge beyond training data poses a significant challenge. In this study, we propose a novel BNER framework called DMNER. By leveraging existing entity representation models SAPBERT, we tackle BNER as a two-step process: entity boundary detection and biomedical entity matching. DMNER exhibits applicability across multiple NER scenarios: 1) In supervised NER, we observe that DMNER effectively rectifies the output of baseline NER models, thereby further enhancing performance. 2) In distantly supervised NER, combining MRC and AutoNER as span boundary detectors enables DMNER to achieve satisfactory results. 3) For training NER by merging multiple datasets, we adopt a framework similar to DS-NER but additionally leverage ChatGPT to obtain high-quality phrases in the training. Through extensive experiments conducted on 10 benchmark datasets, we demonstrate the versatility and effectiveness of DMNER.
- Low-resource name tagging learned with weakly labeled data. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 261–270.
- Jason PC Chiu and Eric Nichols. 2016. Named entity recognition with bidirectional lstm-cnns. volume 4, pages 357–370.
- Nigel Collier and Jin-Dong Kim. 2004. Introduction to the bio-entity recognition task at jnlpba. In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP), pages 73–78.
- Ncbi disease corpus: a resource for disease name recognition and concept normalization. volume 47, pages 1–10. Elsevier.
- Tebner: Domain specific named entity recognition with type expanded boundary-aware network. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 198–207.
- Spanner: Named entity re-/recognition as span prediction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7183–7195.
- Linnaeus: a species name identification system for biomedical literature. volume 11, pages 1–17. BioMed Central.
- John M Giorgi and Gary D Bader. 2018. Transfer learning for biomedical named entity recognition with neural networks. volume 34, pages 4087–4094. Oxford University Press.
- Marginal likelihood training of bilstm-crf for biomedical named entity recognition from disjoint label sets. In Proceedings of the 2018 conference on empirical methods in natural language processing, pages 2824–2829.
- Deep learning with word embeddings improves biomedical named entity recognition. volume 33, pages i37–i48. Oxford University Press.
- Learning a unified named entity tagger from multiple partially annotated corpora for efficient adaptation. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 515–527.
- Better modeling of incomplete annotations for named entity recognition. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 729–734.
- Genia corpus—a semantically annotated corpus for bio-textmining. volume 19, pages i180–i182. Oxford University Press.
- The chemdner corpus of chemicals and drugs and its annotation principles. volume 7, pages 1–17. BioMed Central.
- Biobert: a pre-trained biomedical language representation model for biomedical text mining. volume 36, pages 1234–1240. Oxford University Press.
- Biocreative v cdr task corpus: a resource for chemical disease relation extraction. volume 2016. Oxford Academic.
- A unified mrc framework for named entity recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5849–5859.
- Bond: Bert-assisted open-domain named entity recognition with distant supervision. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1054–1064.
- Self-alignment pretraining for biomedical entity representations.
- Hamner: Headword amplified multi-span distantly supervised method for domain specific named entity recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 8401–8408.
- Xuezhe Ma and Eduard Hovy. 2016. End-to-end sequence labeling via bi-directional lstm-cnns-crf. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1064–1074.
- Named entity recognition with partially annotated training data. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 645–655.
- The species and organisms resources for fast and accurate identification of taxonomic names in text. volume 8, page e65390. Public Library of Science San Francisco, USA.
- Automated phrase mining from massive text corpora. volume 30, pages 1825–1837. IEEE.
- Learning named entity tagger using domain-specific dictionary.
- Takashi Shibuya and Eduard Hovy. 2020. Nested named entity recognition via second-best sequence learning and decoding. volume 8, pages 605–620. MIT Press.
- Overview of biocreative ii gene mention recognition. volume 9, pages 1–19. BioMed Central.
- Mohammad Golam Sohrab and Makoto Miwa. 2018. Deep exhaustive model for nested named entity recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2843–2849.
- Training conditional random fields using incomplete annotations. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 897–904.
- Cross-type biomedical named entity recognition with deep multi-task learning. volume 35, pages 1745–1752. Oxford University Press.
- Multi-grained named entity recognition. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1430–1440.
- A local detection approach for named entity recognition and mention detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1237–1247.
- Optimizing bi-encoder for named entity recognition via contrastive learning.
- Junyi Bian (6 papers)
- Rongze Jiang (1 paper)
- Weiqi Zhai (2 papers)
- Tianyang Huang (1 paper)
- Hong Zhou (61 papers)
- Shanfeng Zhu (9 papers)