Towards Efficient Methods in Medical Question Answering using Knowledge Graph Embeddings (2401.07977v3)

Published 15 Jan 2024 in cs.CL

Abstract: In NLP, Machine Reading Comprehension (MRC) is the task of answering a question based on a given context. To handle questions in the medical domain, modern LLMs such as BioBERT, SciBERT and even ChatGPT are trained on vast amounts of in-domain medical corpora. However, in-domain pre-training is expensive in terms of time and resources. In this paper, we propose a resource-efficient approach for injecting domain knowledge into a model without relying on such domain-specific pre-training. Knowledge graphs are powerful resources for accessing medical information. Building on existing work, we introduce a method using Multi-Layer Perceptrons (MLPs) for aligning and integrating embeddings extracted from medical knowledge graphs with the embedding spaces of pre-trained LLMs (LMs). The aligned embeddings are fused with open-domain LMs BERT and RoBERTa that are fine-tuned for two MRC tasks, span detection (COVID-QA) and multiple-choice questions (PubMedQA). We compare our method to prior techniques that rely on a vocabulary overlap for embedding alignment and show how our method circumvents this requirement to deliver better performance. On both datasets, our method allows BERT/RoBERTa to either perform on par (occasionally exceeding) with stronger domain-specific models or show improvements in general over prior techniques. With the proposed approach, we signal an alternative method to in-domain pre-training to achieve domain proficiency. Our code is available here.

Authors (5)

Saptarshi Sengupta (24 papers)
Connor Heaton (4 papers)
Prasenjit Mitra (58 papers)
Soumalya Sarkar (6 papers)
Suhan Cui (6 papers)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Towards Efficient Methods in Medical Question Answering using Knowledge Graph Embeddings (2401.07977v3)

Summary

Related Papers

Tweets