Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring the In-context Learning Ability of Large Language Model for Biomedical Concept Linking (2307.01137v1)

Published 3 Jul 2023 in cs.CL and cs.AI

Abstract: The biomedical field relies heavily on concept linking in various areas such as literature mining, graph alignment, information retrieval, question-answering, data, and knowledge integration. Although LLMs have made significant strides in many natural language processing tasks, their effectiveness in biomedical concept mapping is yet to be fully explored. This research investigates a method that exploits the in-context learning (ICL) capabilities of large models for biomedical concept linking. The proposed approach adopts a two-stage retrieve-and-rank framework. Initially, biomedical concepts are embedded using LLMs, and then embedding similarity is utilized to retrieve the top candidates. These candidates' contextual information is subsequently incorporated into the prompt and processed by a LLM to re-rank the concepts. This approach achieved an accuracy of 90.% in BC5CDR disease entity normalization and 94.7% in chemical entity normalization, exhibiting a competitive performance relative to supervised learning methods. Further, it showed a significant improvement, with an over 20-point absolute increase in F1 score on an oncology matching dataset. Extensive qualitative assessments were conducted, and the benefits and potential shortcomings of using LLMs within the biomedical domain were discussed. were discussed.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Systematic integration of biomedical knowledge prioritizes drugs for repurposing. Elife, 6:e26726, 2017.
  2. A survey on literature based discovery approaches in biomedical domain. Journal of biomedical informatics, 93:103141, 2019.
  3. A comprehensive survey of entity alignment for knowledge graphs. AI Open, 2:1–13, 2021.
  4. Biomedical named entity recognition and linking datasets: survey and our recent development. Briefings in Bioinformatics, 21(6):2219–2238, 2020.
  5. Bert-based ranking for biomedical entity normalization. AMIA Summits on Translational Science Proceedings, 2020:269, 2020.
  6. Fine-tuning bidirectional encoder representations from transformers (bert)–based models on large-scale electronic health record notes: an empirical study. JMIR medical informatics, 7(3):e14830, 2019.
  7. Carolyn E Lipscomb. Medical subject headings (mesh). Bulletin of the Medical Library Association, 88(3):265, 2000.
  8. Ammu: a survey of transformer-based biomedical pretrained language models. Journal of biomedical informatics, 126:103982, 2022.
  9. Ontology mapping for semantically enabled applications. Drug discovery today, 24(10):2068–2075, 2019.
  10. Deep learning methods for biomedical named entity recognition: a survey and qualitative comparison. Briefings in Bioinformatics, 22(6):bbab282, 2021.
  11. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419, 2023.
  12. Chatgpt and the rise of large language models: the new ai-driven infodemic threat in public health. Frontiers in Public Health, 11:1166120, 2023.
  13. Accelerating the integration of chatgpt and other large-scale ai models into biomedical research and healthcare. MedComm–Future Medicine, 2(2):e43, 2023.
  14. A survey on in-context learning. arXiv, 2023. URL https://arxiv.org/abs/2301.00234.
  15. Scispacy: fast and robust models for biomedical natural language processing. arXiv preprint arXiv:1902.07669, 2019.
  16. Bertmap: a bert-based ontology alignment system. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 5684–5691, 2022a.
  17. Olivier Bodenreider. The unified medical language system (umls): integrating biomedical terminology. Nucleic acids research, 32(suppl_1):D267–D270, 2004.
  18. Metamap lite: an evaluation of a new java implementation of metamap. Journal of the American Medical Informatics Association, 24(4):841–844, 2017.
  19. Quickumls: a fast, unsupervised approach for medical concept extraction. In MedIR workshop, sigir, pages 1–4, 2016.
  20. Knowledge-rich self-supervision for biomedical entity linking. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 868–880, 2022.
  21. Self-alignment pretraining for biomedical entity representations. arXiv preprint arXiv:2010.11784, 2020.
  22. Graph representation learning in bioinformatics: trends, methods and applications. Briefings in Bioinformatics, 23(1):bbab340, 2022.
  23. Review of ontology matching approaches and challenges. International journal of Computer Science and Network Solutions, 3(3):1–27, 2015.
  24. Machine learning-friendly biomedical datasets for equivalence and subsumption ontology matching. In The Semantic Web–ISWC 2022: 21st International Semantic Web Conference, Virtual Event, October 23–27, 2022, Proceedings, pages 575–591. Springer, 2022b.
  25. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 2020.
  26. Contrastive representation learning: A framework and review. Ieee Access, 8:193907–193934, 2020.
  27. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  28. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  29. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
  30. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  31. Pmc-llama: Further finetuning llama on medical papers. arXiv preprint arXiv:2304.14454, 2023.
  32. Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199, 2023a.
  33. How language model hallucinations can snowball. arXiv preprint arXiv:2305.13534, 2023b.
  34. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  35. Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442, 2023.
  36. Biocreative v cdr task corpus: a resource for chemical disease relation extraction. Database, 2016, 2016.
  37. Richard J Roberts. Pubmed central: The genbank of the published literature, 2001.
  38. Victor McKusick. Omim (online mendelian inheritance in man). 2011.
  39. Ordo: an ontology connecting rare disease, epidemiology and genetic data. In Proceedings of ISMB, volume 30. researchgate. net, 2014.
  40. Snomed clinical terms: overview of the development process and project status. In Proceedings of the AMIA Symposium, page 662. American Medical Informatics Association, 2001.
  41. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.
  42. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
  43. Bern2: an advanced neural biomedical named entity recognition and normalization tool. Bioinformatics, 38(20):4837–4839, 2022.
  44. Lsmatch results for oaei 2021. In OM@ ISWC, pages 178–184, 2021.
  45. Logmap: Logic-based and scalable ontology matching. In The Semantic Web–ISWC 2011: 10th International Semantic Web Conference, Bonn, Germany, October 23-27, 2011, Proceedings, Part I 10, pages 273–288. Springer, 2011.
  46. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  47. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Qinyong Wang (11 papers)
  2. Zhenxiang Gao (6 papers)
  3. Rong Xu (12 papers)
Citations (11)

Summary

We haven't generated a summary for this paper yet.