Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

120 tokens/sec

GPT-4o

7 tokens/sec

Gemini 2.5 Pro Pro

46 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning (2304.02711v2)

Published 5 Apr 2023 in cs.AI and cs.LG

Abstract: Creating knowledge bases and ontologies is a time consuming task that relies on a manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrary complex nested knowledge schemas. Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of LLMs to perform zero-shot learning (ZSL) and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against GPT-3+ to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for all matched elements. We present examples of use of SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease causation graphs. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction (RE) methods, but has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. SPIRES is available as part of the open source OntoGPT package: https://github.com/ monarch-initiative/ontogpt.

References (59)

Citations (62)

View on Semantic Scholar

Summary

The paper introduces SPIRES, a framework that leverages structured prompt interrogation and recursive semantic extraction to automate knowledge base population.
The methodology employs zero-shot learning with large language models, reducing reliance on extensive training data while ensuring precise entity grounding.
Evaluated across domains like biomedical data, SPIRES achieves competitive performance compared to trained models by effectively managing complex, nested schemas.

Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): A Methodological Overview

The paper "Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES)" introduces an innovative method for knowledge extraction leveraging LLMs. This methodology aims to automate the population of Knowledge Bases (KBs) using zero-shot learning, thereby reducing the requirement of extensive training data traditionally necessary for such tasks.

Core Contributions

SPIRES operates by processing a text through a user-defined schema, leveraging the LLMs' capability for general-purpose query answering. The method recursively interrogates structured prompts to extract data conforming to the specified schema, while integrating existing ontologies to provide unique identifiers for the elements involved. Importantly, the approach supports complex, nested knowledge schemas, which are often challenging for existing methods to handle without detailed training data.

Methodological Framework

The SPIRES framework consists of the following key steps:

Prompt Generation: Given a schema and input text, a structured prompt is created to instruct the LLM on the expected output format.
Prompt Completion: The prompt is processed by the LLM to generate a response, structured as per the provided template.
Parsing and Recursive Extraction: The response is parsed to identify entities and relationships, employing recursive schema interrogation for nested structures.
Entity Grounding: Extracted entities are grounded using external ontologies, providing reliability by mapping to persistent identifiers from existing vocabularies.
Optional OWL Translation: The extracted data can be translated to Web Ontology Language (OWL) for further reasoning and ontology management tasks.

Evaluation and Results

SPIRES has been evaluated across various domains, including food recipes, cellular signaling pathways, disease treatments, and chemical-disease relationships. Notably, in the BioCreative Chemical-Disease-Relation task, SPIRES demonstrated an F-score competitive with that of trained domain-specific models. The method's ability to perform without the need for specific training data highlights its flexibility and potential for wide applicability.

The system's grounding efficacy was rigorously tested against multiple ontologies, showcasing significant improvements over direct LLM prompting. For example, SPIRES achieved highly accurate entity grounding using the Gene Ontology and other curated datasets, utilizing GPT-3.5-turbo and GPT-4 models.

Implications and Future Directions

SPIRES effectively mitigates some of the common limitations of LLMs, such as hallucinations and contextual misinterpretations, by enforcing extraction through structured schemas and integrating established ontologies. The method's zero-shot learning capability presents significant practical advantages, making it an attractive option for rapid deployment in new domains without bespoke training datasets.

The framework offers a systematic approach to knowledge base population, leveraging AI advancements to synergize with human expertise. Future developments could explore fine-tuning for domain specificity and integration with more publicly accessible and transparent LLMs to enhance acceptance and reliability in critical fields like biomedical data processing.

SPIRES is an open-source component of the OntoGPT package, providing the research community with a tool to transform unstructured text into actionable structured data. Its adaptability and schema-driven methodology align well with contemporary needs for scalable knowledge management solutions, poised for future advancements in AI-driven data curation.

PDF Markdown

GitHub

GitHub - monarch-initiative/ontogpt: LLM-based ontological extraction tools, including SPIRES (538 stars)

Tweets

https://twitter.com/chrismungall/status/1752170097979109791

https://twitter.com/chrismungall/status/1752937781264011745

YouTube

Show All Videos