SPIRES: Structured Prompt Interrogation Framework
- The paper presents SPIRES as a zero-shot LLM-based framework to extract structured, nested knowledge bases from unstructured text with deterministic ontology grounding.
- It details a recursive workflow combining prompt engineering, YAML templating, and external ontology querying to accurately ground entities to known vocabularies.
- Empirical evaluations on relation extraction and NER tasks demonstrate competitive performance without task-specific training, underscoring its practical deployment and areas for improvement.
Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES) is a Knowledge Extraction framework that leverages the zero-shot capabilities of LLMs to populate complex, nested knowledge base schemas directly from unstructured text. It is designed to automate ontology and knowledge base construction without reliance on task-specific training data, integrating user-defined schemas, recursive prompt engineering, deterministic ontology grounding, and compatibility with existing open ontologies (Caufield et al., 2023).
1. Problem Formulation and Schema Specification
SPIRES addresses the extraction of structured knowledge —or a graph —from raw text , conforming to a user-defined schema . The schema is formally defined as a set of classes and attributes: Attributes are richly specified, including (human-readable label), , , , (primitive, class, enumeration), , and . Each class can have an allowed set of identifier spaces: The extraction objective is to produce an instance or a graph such that or , using as the informational source. The approach allows for arbitrary schema complexity, including deeply nested and multivalued structures (Caufield et al., 2023).
2. Recursive Prompting and Extraction Algorithm
The SPIRES workflow is inherently recursive, reflecting the possible nesting in the schema. At each step, the following stages are executed:
- Prompt Generation: A structured, pseudo-YAML template is constructed based on , (entry-point class), and . Attribute-specific prompts are included, either user-defined or auto-generated.
- Interaction with LLMs: The prompt is submitted to an LLM (e.g., GPT-3.5, GPT-4), which returns a populated pseudo-YAML completion.
- Parsing and Recursion: The completion is parsed line-by-line, matching keys to schema attributes in a case-insensitive manner. For attributes whose range is itself a class and where , the procedure recurses with the relevant text fragment and sub-schema.
- Grounding: String values corresponding to named entities are grounded to ontology CURIEs using the specified and external ontology services.
- Optional Translation: The structured output may be further materialized as OWL axioms via tools such as ROBOT or LinkML-OWL mappings.
Core pseudocode presented in the source:
1 2 3 4 5 6 |
Function SPIRES(S, C, T): 1. p ← GeneratePrompt(S, C, T) 2. r ← CompletePrompt(p) 3. iu ← ParseCompletion(r, S, C) 4. i ← Ground(iu, S, C) 5. return i |
3. Prompt Construction and Example Templates
At each SPIRES invocation, a prompt is emitted comprising instructions, attribute templates, and the target text. The templates are rendered in pseudo-YAML, for example:
1 2 3 4 5 |
Split the following piece of text into fields in the following format: food_item: <the food item> amount: <the quantity of the ingredient> Text: garlic powder (2 tablespoons) === |
4. Ontology-based Entity and Relation Grounding
After LLM-based extraction, SPIRES deterministically grounds relevant strings to ontology identifiers through a multi-stage process:
- Utilization of ontology services via OAKlib, encompassing Gilda for biomedical normalization, the NCATS Translator NodeNormalizer, BioPortal/AgroPortal Annotator, and the Ontology Lookup Service.
- For each string to be grounded, SPIRES queries allowed prefixes in or to retrieve and select the best CURIE candidate. Where no candidate is found, the original string is retained or flagged.
- This process is recursive for nested objects, ensuring all references are mapped and validated to ontology URIs where specified.
Pseudocode for grounding:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Function Ground(iu, S, C):
For each attribute a in iu:
If iu[a] is a string and Range(a) is a reference class:
For each vocabulary prefix p in IDSpaces(Range(a)):
candidates ← QueryAnnotator(p, iu[a])
If candidates not empty:
select best candidate CURIE
iu[a] ← candidate
break
If no candidate found → leave as literal or flag error
Else if iu[a] is a nested instance:
iu[a] ← Ground(iu[a], S, Range(a))
return iu |
5. Empirical Evaluation and Benchmarking
SPIRES was evaluated on both ontology grounding and relation extraction tasks.
- Ontology Term Grounding: For 100 random terms from each of GO, EMAPA, and MONDO:
- GPT-3.5-turbo via SPIRES: 98/100 GO, 100/100 EMAPA, 97/100 MONDO.
- GPT-4-turbo via SPIRES: 97/100 GO, 100/100 EMAPA, 18/100 MONDO (affected by parsing issues).
- Direct LLM prompting (without SPIRES) yields substantially inferior results (e.g., 3/100 GO for GPT-3.5-turbo).
- Relation Extraction (BC5CDR Chemical–Disease): On 500 abstracts (1066 CID triples):
- GPT-3.5-turbo + chunking: , , .
- GPT-4-turbo: , , .
- Supervised systems report up to ; unsupervised SPIRES is mid-range.
- Named Entity Recognition Grounding:
- GPT-4-turbo: Chemical , Disease .
A summary comparison:
| Method | Training Data | Handles Nested Schemas? | Grounding to Ontology | BC5CDR F1 |
|---|---|---|---|---|
| SPIRES (GPT-4) | 0 examples | Yes | Yes | 0.438 |
| BioGPT (fine-tuned) | 1000s | Flat RE only | Limited | 0.450 |
| Best BioCreative participant | 1000s | Flat RE only | Varies | 0.570 |
SPIRES delivers mid-range relation extraction performance without task-specific training or annotation, and uniquely supports complex schema structures and ontology-based grounding (Caufield et al., 2023).
6. Comparison with Prior and Contemporary Methods
SPIRES contrasts with existing Relation Extraction (RE) frameworks along several axes:
- Zero-shot Generalization: Requires no annotated triples or domain-specific fine-tuning. In contrast, supervised RE approaches demand extensive labeled data.
- Schema Flexibility: Capable of extracting data into arbitrarily nested, user-specified schemas (e.g., food recipes, drug mechanisms, disease models), whereas most RE models are restricted to flat, binary/ternary tuple schemas.
- Deterministic Ontology Grounding: Integrates external vocabulary grounding, circumventing unreliable LLM-generated CURIEs, and supporting validation and alignment with existing ontologies.
- Customization: Operates over arbitrary LinkML schemas with minimal adjustment, facilitating immediate application in new domains.
- Relative Accuracy: SPIRES achieves competitive F1 on BC5CDR with no fine-tuning, but does not set state-of-the-art scores.
This comparison underscores SPIRES’s unique capacity for rapid deployment and adaptability in knowledge base construction (Caufield et al., 2023).
7. Limitations and Future Prospects
Identified limitations include:
- LLM Hallucinations: Despite explicit prompting to extract only from input text and post-hoc grounding, occasional hallucinated or imprecise extractions persist, necessitating user validation prior to knowledge base ingestion.
- API Dependence: Use of proprietary LLM APIs introduces privacy, bias, and financial considerations. Integration of open-source LLMs (e.g., LLaMA2-based models) is planned.
- Chunking vs. Context: Sliding-window “chunking” boosts recall but impacts throughput. Exploration of more robust context management and document-level reasoning is suggested.
- Qualifier Extraction: While SPIRES can extract relation qualifiers, these were not included in BC5CDR evaluation; richer output evaluation is pending.
- Ontology Alignment: Potential exists to integrate with advanced ontology-matching methods (e.g., Agent-OM, MapperGPT) for improved cross-ontology linkage.
Envisioned extensions include tighter integration with open LLMs, expanded OWL-based downstream reasoning, and development of interactive interfaces for expert validation and correction (Caufield et al., 2023).