Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Scientific Language Models for Biomedical Knowledge Base Completion: An Empirical Study (2106.09700v2)

Published 17 Jun 2021 in cs.CL and cs.LG

Abstract: Biomedical knowledge graphs (KGs) hold rich information on entities such as diseases, drugs, and genes. Predicting missing links in these graphs can boost many important applications, such as drug design and repurposing. Recent work has shown that general-domain LLMs (LMs) can serve as "soft" KGs, and that they can be fine-tuned for the task of KG completion. In this work, we study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction. We evaluate several domain-specific LMs, fine-tuning them on datasets centered on drugs and diseases that we represent as KGs and enrich with textual entity descriptions. We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance. Finally, we demonstrate the advantage of LM models in the inductive setting with novel scientific entities. Our datasets and code are made publicly available.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Rahul Nadkarni (4 papers)
  2. David Wadden (24 papers)
  3. Iz Beltagy (39 papers)
  4. Noah A. Smith (224 papers)
  5. Hannaneh Hajishirzi (176 papers)
  6. Tom Hope (41 papers)
Citations (23)
Youtube Logo Streamline Icon: https://streamlinehq.com