Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PubTator 3.0: an AI-powered Literature Resource for Unlocking Biomedical Knowledge (2401.11048v1)

Published 19 Jan 2024 in cs.CL and q-bio.QM

Abstract: PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases, and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly. PubTator 3.0's online interface and API utilize these precomputed entity relations and synonyms to provide advanced search capabilities and enable large-scale analyses, streamlining many complex information needs. We showcase the retrieval quality of PubTator 3.0 using a series of entity pair queries, demonstrating that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results. We further show that integrating ChatGPT (GPT-4) with PubTator APIs dramatically improves the factuality and verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive set of features and tools that allow researchers to navigate the ever-expanding wealth of biomedical literature, expediting research and unlocking valuable insights for scientific discovery.

Citations (19)

Summary

  • The paper introduces PubTator 3.0, which employs AI to perform semantic and relational searches across extensive biomedical literature.
  • It annotates over one billion entities from 36 million abstracts and six million full-text articles, ensuring efficient information retrieval.
  • The system outperforms traditional search tools by achieving high precision in top query results and enhancing LLM factuality via GPT-4 integration.

Introduction

The biomedical domain is replete with a high volume of literature that is critical for facilitating advancements in research and clinical practices. Traditional methods of information retrieval from literature have relied heavily on keyword-based searches. These approaches, though foundational, are fraught with limitations like terminological variability and incomplete query fulfiLLMent. The introduction of PubTator 3.0 aims to surmount these challenges by employing advanced AI techniques for semantic and relation searches across key biomedical concepts.

System Features

PubTator 3.0, a comprehensive AI-driven resource, harnesses over one billion entity and relation annotations across approximately 36 million PubMed abstracts and six million full-text articles from the PMC Open Access Subset. Updated weekly, it extends the capabilities of semantic and relational queries beyond the conventional keyword searches. Its online interface and API promote exploratory and precise searching, aided by features like query auto-completion and facet filters. This allows for a user-friendly interactive exploration of the literature.

Performance and Efficiency

The performance evaluation of PubTator 3.0 reveals its superiority in retrieval efficiency compared to PubMed and Google Scholar. It demonstrates a substantial number in retrieved articles with high precision evident in the top 20 results. The integration of PubTator APIs with GPT-4 has been invaluable in enhancing the factuality and verifiability of LLM responses, markedly improving upon the issue of hallucinations—erroneous or fabricated responses—inherent in LLMs.

Conclusion

PubTator 3.0 represents a leap forward in efficiently accessing biomedical knowledge. It facilitates a rapidly expanding literature landscape by aligning entities and their relations, thus fulfilling complex information needs. The progress achieved in automating such tasks has the potential to accelerate research seamlessly and contribute substantive trustworthy insights to the biomedical field. PubTator 3.0's offerings are accessible through its interface and API, promising to be a cornerstone for biomedical information retrieval and a valuable partner for future AI applications in biomedicine.