Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Biomedical Entity Linking: Clinical Text Standardization with Low-Resource Techniques (2405.15134v2)

Published 24 May 2024 in cs.CL

Abstract: Clinical text is rich in information, with mentions of treatment, medication and anatomy among many other clinical terms. Multiple terms can refer to the same core concepts which can be referred as a clinical entity. Ontologies like the Unified Medical Language System (UMLS) are developed and maintained to store millions of clinical entities including the definitions, relations and other corresponding information. These ontologies are used for standardization of clinical text by normalizing varying surface forms of a clinical term through Biomedical entity linking. With the introduction of transformer-based LLMs, there has been significant progress in Biomedical entity linking. In this work, we focus on learning through synonym pairs associated with the entities. As compared to the existing approaches, our approach significantly reduces the training data and resource consumption. Moreover, we propose a suite of context-based and context-less reranking techniques for performing the entity disambiguation. Overall, we achieve similar performance to the state-of-the-art zero-shot and distant supervised entity linking techniques on the Medmentions dataset, the largest annotated dataset on UMLS, without any domain-based training. Finally, we show that retrieval performance alone might not be sufficient as an evaluation metric and introduce an article level quantitative and qualitative analysis to reveal further insights on the performance of entity linking methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Alan R Aronson. 2001. Effective mapping of biomedical text to the umls metathesaurus: the metamap program. In Proceedings of the AMIA Symposium, page 17. American Medical Informatics Association.
  2. Olivier Bodenreider. 2004. The unified medical language system (umls): integrating biomedical terminology. Nucleic acids research, 32(suppl_1):D267–D270.
  3. Leo Breiman. 2001. Random forests. Machine learning, 45:5–32.
  4. Nello Cristianini and John Shawe-Taylor. 2000. An introduction to support vector machines and other kernel-based learning methods. Cambridge university press.
  5. Snomed ct standard ontology based on the ontology for general medical science. BMC medical informatics and decision making, 18:1–19.
  6. Samuele Garda and Ulf Leser. 2024. Belhd: Improving biomedical entity linking with homonoym disambiguation. arXiv preprint arXiv:2401.05125.
  7. Domain-specific language model pretraining for biomedical natural language processing.
  8. Efficient natural language response suggestion for smart reply. arXiv preprint arXiv:1705.00652.
  9. A comprehensive evaluation of biomedical entity linking models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14462–14478, Singapore. Association for Computational Linguistics.
  10. Zero-shot medical entity retrieval without annotation: Learning from rich knowledge graph semantics. arXiv preprint arXiv:2105.12682.
  11. Leveraging semantic type dependencies for clinical named entity recognition. In AMIA Annual Symposium Proceedings, volume 2022, page 662. American Medical Informatics Association.
  12. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240.
  13. Self-alignment pretraining for biomedical entity representations. arXiv preprint arXiv:2010.11784.
  14. Two complementary ai approaches for predicting umls semantic group assignment: heuristic reasoning and deep learning. Journal of the American Medical Informatics Association, 30(12):1887–1894.
  15. Sunil Mohan and Donghui Li. 2019. Medmentions: A large biomedical corpus annotated with umls concepts. arXiv preprint arXiv:1902.09476.
  16. Scispacy: fast and robust models for biomedical natural language processing. arXiv preprint arXiv:1902.07669.
  17. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
  18. Abbreviation definition identification based on automatic precision estimates. BMC bioinformatics, 9:1–10.
  19. Large language models are poor medical coders—benchmarking of medical code querying. NEJM AI, page AIdbp2300040.
  20. Attention is all you need. Advances in neural information processing systems, 30.
  21. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. Advances in Neural Information Processing Systems, 33:5776–5788.
  22. Generative biomedical entity linking via knowledge base-guided pre-training and synonyms-aware fine-tuning. arXiv preprint arXiv:2204.05164.
  23. Knowledge-rich self-supervision for biomedical entity linking. arXiv preprint arXiv:2112.07887.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Akshit Achara (5 papers)
  2. Sanand Sasidharan (2 papers)
  3. Gagan N (1 paper)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets