GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information (2304.09667v3)

Published 19 Apr 2023 in cs.CL, cs.AI, and q-bio.GN

Abstract: While LLMs have been successfully applied to various tasks, they still face challenges with hallucinations. Augmenting LLMs with domain-specific tools such as database utilities can facilitate easier and more precise access to specialized knowledge. In this paper, we present GeneGPT, a novel method for teaching LLMs to use the Web APIs of the National Center for Biotechnology Information (NCBI) for answering genomics questions. Specifically, we prompt Codex to solve the GeneTuring tests with NCBI Web APIs by in-context learning and an augmented decoding algorithm that can detect and execute API calls. Experimental results show that GeneGPT achieves state-of-the-art performance on eight tasks in the GeneTuring benchmark with an average score of 0.83, largely surpassing retrieval-augmented LLMs such as the new Bing (0.44), biomedical LLMs such as BioMedLM (0.08) and BioGPT (0.04), as well as GPT-3 (0.16) and ChatGPT (0.12). Our further analyses suggest that: (1) API demonstrations have good cross-task generalizability and are more useful than documentations for in-context learning; (2) GeneGPT can generalize to longer chains of API calls and answer multi-hop questions in GeneHop, a novel dataset introduced in this work; (3) Different types of errors are enriched in different tasks, providing valuable insights for future improvements.

Citations (117)

View on Semantic Scholar

Summary

The paper presents GeneGPT as an innovative approach that augments LLMs with NCBI APIs to deliver verifiable biomedical information.
It leverages E-utils and BLAST to accurately retrieve genomic data and mitigate the hallucination issues common in LLM outputs.
Experimental evaluations on the GeneTuring benchmark reveal that GeneGPT outperforms state-of-the-art models in handling multi-hop biomedical queries.

Enhancing LLMs with Domain-Specific Tools for Biomedical Information Access

Introduction

The advent of LLMs such as GPT-4 and PaLM has significantly advanced the field of NLP, demonstrating remarkable success across a broad spectrum of tasks. These models' ability to generate coherent, contextually relevant text based on a given prompt has made them valuable tools for a variety of applications, including domain-specific tasks like biomedical question answering. However, LLMs inherently struggle with generating factually accurate information, often falling prey to hallucinations—making plausible yet incorrect assertions or statements. One proposed solution to mitigate this issue is tool augmentation, whereby LLMs are enhanced with access to specialized databases or utilities, providing a direct pipeline to accurate, verifiable data.

GeneGPT: An Innovative Approach

In this context, the paper presents GeneGPT, a pioneering approach designed to augment LLMs with domain-specific Web API tools for improved access to biomedical information. By leveraging the Web APIs of the National Center for Biotechnology Information (NCBI), GeneGPT enables LLMs to execute precise queries across extensive biomedical databases. This integration facilitates the retrieval of accurate domain-specific information, such as genomic data, directly from authoritative sources, effectively addressing the challenge of hallucinations in LLM outputs.

NCBI Web APIs Utilization

The methodology employed by GeneGPT involves the use of NCBI's Web APIs, specifically E-utils for accessing biomedicinal databases and BLAST for DNA sequence alignment. These APIs present a gateway to a vast reservoir of genomic data, where E-utils enable the querying of various types of biomedical information and BLAST facilitates the alignment of nucleotide or protein sequences to identify genomic locations and related sequences. By instructing an LLM to utilize these APIs through crafted prompts and an augmented decoding algorithm, GeneGPT ensures that the model's outputs are grounded in factual, verifiable information directly retrieved from NCBI's databases.

Experimental Evaluation and Benchmarking

GeneGPT's efficacy is rigorously evaluated against the GeneTuring benchmark—a suite of tasks tailored for genomic question answering. The performance of GeneGPT is compared with that of several other state-of-the-art LLMs, including those augmented with retrieval capabilities like the new Bing LLM. GeneGPT exhibits superior performance across all tasks, significantly outpacing its counterparts, including notable models such as BioMedLM and BioGPT. This achievement highlights the unparalleled advantage of direct tool augmentation in providing accurate, domain-specific information over retrieval-augmented methods that rely on potentially outdated or incorrect web sources.

Insights and Observations

Cross-Task Generalizability: GeneGPT demonstrates remarkable proficiency in generalizing across different tasks with minimal demonstration, underscoring the efficacy of API demonstrations in enhancing the model's learning and application capabilities.
Performance on Multi-Hop Questions: The ability of GeneGPT to handle complex, multi-hop questions through a series of API calls further emphasizes its potential in tackling real-world biomedical queries that require multi-faceted reasoning and data retrieval.
Error Analysis: A detailed investigation into the errors made by GeneGPT across different tasks sheds light on the limitations and challenges inherent to interfacing LLMs with external databases and tools, offering valuable direction for future improvements.

Future Prospects and Theoretical Implications

GeneGPT signifies a significant stride toward the integration of LLMs with domain-specific tools, paving the way for more reliable and precise information retrieval capabilities within the field of AI-driven biomedical research. This paper not only demonstrates the practical benefits of such an integration in enhancing the accuracy and reliability of LLM outputs but also propels the discourse on the theoretical understanding of tool-augmented LLMs. The promise shown by GeneGPT in tackling the hallucination problem encourages further exploration into the synergistic potential of combining LLMs with specialized domain tools, suggesting a fertile ground for future research endeavors aimed at unlocking new capabilities and applications of AI in biomedicine and beyond.

In conclusion, GeneGPT's innovative approach and remarkable performance herald a new era in the application of LLMs to domain-specific tasks. By bridging the gap between LLMs and external knowledge sources through tool augmentation, GeneGPT sets a precedent for the development of more accurate, reliable, and practical AI-powered solutions in biomedicine and other specialized fields.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Authors (4)

Tweets

https://twitter.com/rkakamilan/status/1756803919450403268