Improving Biomedical Entity Linking with Retrieval-enhanced Learning (2312.09806v1)
Abstract: Biomedical entity linking (BioEL) has achieved remarkable progress with the help of pre-trained LLMs. However, existing BioEL methods usually struggle to handle rare and difficult entities due to long-tailed distribution. To address this limitation, we introduce a new scheme $k$NN-BioEL, which provides a BioEL model with the ability to reference similar instances from the entire training corpus as clues for prediction, thus improving the generalization capabilities. Moreover, we design a contrastive learning objective with dynamic hard negative sampling (DHNS) that improves the quality of the retrieved neighbors during inference. Extensive experimental results show that $k$NN-BioEL outperforms state-of-the-art baselines on several datasets.
- Olivier Bodenreider, “The Unified Medical Language System (UMLS): integrating biomedical terminology,” Nucleic Acids Research, vol. 32, 2004.
- “OntoEA: Ontology-guided Entity Alignment via Joint Knowledge Graph Embedding,” in Findings of ACL-IJCNLP, Aug. 2021.
- “Efficient Symptom Inquiring and Diagnosis via Adaptive Alignment of Reinforcement Learning and Classification,” arXiv preprint arXiv:2112.00733, 2021.
- “Emerging Drug Interaction Prediction Enabled by Flow-based Graph Neural Network with Biomedical Network,” Nature Computational Science.
- “TaggerOne: Joint Named Entity Recognition and Normalization with Semi-Markov Models,” Bioinformatics, vol. 32, 2016.
- “Multi-task Character-level Attentional Networks for Medical Concept Normalization,” Neural Processing Letters, vol. 49, 2019.
- “Biomedical Entity Representations with Synonym Marginalization,” in Proc. ACL, 2020, pp. 3641–3650.
- “BERT might be Overkill: A Tiny but Effective Biomedical Entity Linker based on Residual Convolutional Neural Networks,” in Findings of EMNLP, 2021.
- “Self-Alignment Pretraining for Biomedical Entity Representations,” in Proc. NAACL-HLT, 2021.
- “Prompt Combines Paraphrase: Teaching Pre-trained Models to Understand Rare Biomedical Words,” in Proc. COLING, Oct. 2022.
- “A Generate-and-Rank Framework with Semantic Type Regularization for Biomedical Concept Normalization,” in Proc. ACL, 2020.
- “Enhancing Entity Representations with Prompt Learning for Biomedical Entity Linking,” in Proc. AAAI, 2021.
- “Improving Biomedical Entity Linking with Cross-Entity Interaction,” in Proc. AAAI, 2023, vol. 37.
- “Generative Biomedical Entity Linking via Knowledge Base-Guided Pre-training and Synonyms-Aware Fine-tuning,” in Proc. NAACL-HLT, 2022.
- “BioBART: Pretraining and Evaluation of a Biomedical Generative Language Model,” arXiv preprint arXiv:2204.03905, 2022.
- “Nearest Neighbor Machine Translation,” in International Conference on Learning Representations, 2020.
- “SimCSE: Simple Contrastive Learning of Sentence Embeddings,” in Proc. EMNLP, 2021.
- “NCBI Disease Corpus: A Resource for Disease Name Recognition and Concept Normalization,” Journal of Biomedical Informatics, vol. 47, 2014.
- “BioCreative V CDR task corpus: a resource for chemical disease relation extraction,” Database, 2016.
- “COMETA: A Corpus for Medical Entity Linking in the Social Media,” in Proc. EMNLP, 2020.
- “Normalising Medical Concepts in Social Media Texts by Learning Semantic Representation,” in Proc. ACL, 2016.
- “Clustering-based Inference for Biomedical Entity Linking,” in Proc. NAACL-HLT, 2021.