SPIRES: Structured Prompt Interrogation Framework

Updated 21 December 2025

The paper presents SPIRES as a zero-shot LLM-based framework to extract structured, nested knowledge bases from unstructured text with deterministic ontology grounding.
It details a recursive workflow combining prompt engineering, YAML templating, and external ontology querying to accurately ground entities to known vocabularies.
Empirical evaluations on relation extraction and NER tasks demonstrate competitive performance without task-specific training, underscoring its practical deployment and areas for improvement.

Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES) is a Knowledge Extraction framework that leverages the zero-shot capabilities of LLMs to populate complex, nested knowledge base schemas directly from unstructured text. It is designed to automate ontology and knowledge base construction without reliance on task-specific training data, integrating user-defined schemas, recursive prompt engineering, deterministic ontology grounding, and compatibility with existing open ontologies (Caufield et al., 2023).

1. Problem Formulation and Schema Specification

SPIRES addresses the extraction of structured knowledge $i$ —or a graph $G$ —from raw text $T$ , conforming to a user-defined schema $S$ . The schema $S$ is formally defined as a set of classes and attributes: $\text{Classes}(S) = \{c_1, c_2, \dots, c_n\}, \quad \text{Attributes}(c_i) = \{c_i a_1, \dots, c_i a_{m_i}\}$ Attributes are richly specified, including $\text{Name}(a)$ (human-readable label), $\text{Multivalued}(a)$ , $\text{Identifier}(a)$ , $\text{Prompt}(a)$ , $\text{Range}(a)$ (primitive, class, enumeration), $\text{ValueSets}(a)$ , and $\text{Inlined}(a)$ . Each class $c_i$ can have an allowed set of identifier spaces: $\mathit{IDSpaces}(c_i) = \{\mathit{prefix}_1, \dots, \mathit{prefix}_k\}$ The extraction objective is to produce an instance $i$ or a graph $G$ such that $i \models S$ or $G \in \text{Instantiations}(S)$ , using $T$ as the informational source. The approach allows for arbitrary schema complexity, including deeply nested and multivalued structures (Caufield et al., 2023).

2. Recursive Prompting and Extraction Algorithm

The SPIRES workflow is inherently recursive, reflecting the possible nesting in the schema. At each step, the following stages are executed:

Prompt Generation: A structured, pseudo-YAML template is constructed based on $S$ , $C$ (entry-point class), and $T$ . Attribute-specific prompts are included, either user-defined or auto-generated.
Interaction with LLMs: The prompt is submitted to an LLM (e.g., GPT-3.5, GPT-4), which returns a populated pseudo-YAML completion.
Parsing and Recursion: The completion is parsed line-by-line, matching keys to schema attributes in a case-insensitive manner. For attributes whose range is itself a class and where $\text{Inlined}(a)=\text{True}$ , the procedure recurses with the relevant text fragment and sub-schema.
Grounding: String values corresponding to named entities are grounded to ontology CURIEs using the specified $\mathit{IDSpaces}$ and external ontology services.
Optional Translation: The structured output may be further materialized as OWL axioms via tools such as ROBOT or LinkML-OWL mappings.

Core pseudocode presented in the source:

Function SPIRES(S, C, T):
  1. p ← GeneratePrompt(S, C, T)
  2. r ← CompletePrompt(p)
  3. iu ← ParseCompletion(r, S, C)
  4. i ← Ground(iu, S, C)
  5. return i

This recursive delegation allows the extraction of arbitrarily complex, nested knowledge structures, unlike traditional RE systems limited to binary or ternary relations (Caufield et al., 2023).

3. Prompt Construction and Example Templates

At each SPIRES invocation, a prompt is emitted comprising instructions, attribute templates, and the target text. The templates are rendered in pseudo-YAML, for example:

Split the following piece of text into fields in the following format:
food_item: <the food item>
amount: <the quantity of the ingredient>
Text: garlic powder (2 tablespoons)
===

Custom instructions and multivalued specifications ("A semicolon-separated list of ...") are included as required by the schema. For nested/inlined classes (e.g., Quantity inside Ingredient), recursion is triggered with the relevant text fragment (Caufield et al., 2023).

4. Ontology-based Entity and Relation Grounding

After LLM-based extraction, SPIRES deterministically grounds relevant strings to ontology identifiers through a multi-stage process:

Utilization of ontology services via OAKlib, encompassing Gilda for biomedical normalization, the NCATS Translator NodeNormalizer, BioPortal/AgroPortal Annotator, and the Ontology Lookup Service.
For each string to be grounded, SPIRES queries allowed prefixes in $\mathit{IDSpaces}$ or $\text{ValueSets}$ to retrieve and select the best CURIE candidate. Where no candidate is found, the original string is retained or flagged.
This process is recursive for nested objects, ensuring all references are mapped and validated to ontology URIs where specified.

Pseudocode for grounding:

Function Ground(iu, S, C):
  For each attribute a in iu:
    If iu[a] is a string and Range(a) is a reference class:
      For each vocabulary prefix p in IDSpaces(Range(a)):
        candidates ← QueryAnnotator(p, iu[a])
        If candidates not empty:
          select best candidate CURIE
          iu[a] ← candidate
          break
      If no candidate found → leave as literal or flag error
    Else if iu[a] is a nested instance:
      iu[a] ← Ground(iu[a], S, Range(a))
  return iu

This approach mitigates LLM-induced hallucination of CURIEs by enforcing deterministic, service-based grounding (Caufield et al., 2023).

5. Empirical Evaluation and Benchmarking

SPIRES was evaluated on both ontology grounding and relation extraction tasks.

Ontology Term Grounding: For 100 random terms from each of GO, EMAPA, and MONDO:
- GPT-3.5-turbo via SPIRES: 98/100 GO, 100/100 EMAPA, 97/100 MONDO.
- GPT-4-turbo via SPIRES: 97/100 GO, 100/100 EMAPA, 18/100 MONDO (affected by parsing issues).
- Direct LLM prompting (without SPIRES) yields substantially inferior results (e.g., 3/100 GO for GPT-3.5-turbo).
Relation Extraction (BC5CDR Chemical–Disease): On 500 abstracts (1066 CID triples):
- GPT-3.5-turbo + chunking: $P=0.43$ , $R=0.39$ , $F_1=0.4116$ .
- GPT-4-turbo: $P=0.69$ , $R=0.32$ , $F_1=0.4380$ .
- Supervised systems report up to $F_1 \approx 0.57$ ; unsupervised SPIRES is mid-range.
Named Entity Recognition Grounding:
- GPT-4-turbo: Chemical $F_1=0.737$ , Disease $F_1=0.697$ .

A summary comparison:

Method	Training Data	Handles Nested Schemas?	Grounding to Ontology	BC5CDR F1
SPIRES (GPT-4)	0 examples	Yes	Yes	0.438
BioGPT (fine-tuned)	1000s	Flat RE only	Limited	0.450
Best BioCreative participant	1000s	Flat RE only	Varies	0.570

SPIRES delivers mid-range relation extraction performance without task-specific training or annotation, and uniquely supports complex schema structures and ontology-based grounding (Caufield et al., 2023).

6. Comparison with Prior and Contemporary Methods

SPIRES contrasts with existing Relation Extraction (RE) frameworks along several axes:

Zero-shot Generalization: Requires no annotated triples or domain-specific fine-tuning. In contrast, supervised RE approaches demand extensive labeled data.
Schema Flexibility: Capable of extracting data into arbitrarily nested, user-specified schemas (e.g., food recipes, drug mechanisms, disease models), whereas most RE models are restricted to flat, binary/ternary tuple schemas.
Deterministic Ontology Grounding: Integrates external vocabulary grounding, circumventing unreliable LLM-generated CURIEs, and supporting validation and alignment with existing ontologies.
Customization: Operates over arbitrary LinkML schemas with minimal adjustment, facilitating immediate application in new domains.
Relative Accuracy: SPIRES achieves competitive F1 on BC5CDR with no fine-tuning, but does not set state-of-the-art scores.

This comparison underscores SPIRES’s unique capacity for rapid deployment and adaptability in knowledge base construction (Caufield et al., 2023).

7. Limitations and Future Prospects

Identified limitations include:

LLM Hallucinations: Despite explicit prompting to extract only from input text and post-hoc grounding, occasional hallucinated or imprecise extractions persist, necessitating user validation prior to knowledge base ingestion.
API Dependence: Use of proprietary LLM APIs introduces privacy, bias, and financial considerations. Integration of open-source LLMs (e.g., LLaMA2-based models) is planned.
Chunking vs. Context: Sliding-window “chunking” boosts recall but impacts throughput. Exploration of more robust context management and document-level reasoning is suggested.
Qualifier Extraction: While SPIRES can extract relation qualifiers, these were not included in BC5CDR evaluation; richer output evaluation is pending.
Ontology Alignment: Potential exists to integrate with advanced ontology-matching methods (e.g., Agent-OM, MapperGPT) for improved cross-ontology linkage.

Envisioned extensions include tighter integration with open LLMs, expanded OWL-based downstream reasoning, and development of interactive interfaces for expert validation and correction (Caufield et al., 2023).

PDF Markdown Chat (Pro)

References (1)

Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning (2023)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES).