Generated Knowledge Prompting (GKP)
- Generated Knowledge Prompting is a paradigm that synthesizes explicit knowledge using LLMs and external resources to address data scarcity and enhance interpretability.
- It employs techniques like free-form generation and symbolic triple extraction to provide transparent reasoning and significantly improve QA performance metrics.
- Integrating GKP in multimodal and dialogue systems demonstrates notable gains in accuracy and human-centered evaluations, offering versatile real-world applications.
Generated Knowledge Prompting (GKP) is a paradigm in which explicit knowledge, either in free-text or structured form, is synthesized or surfaced by large pre-trained models or via external resources, then injected into the LLM’s context through carefully designed prompts. GKP enables models to externalize and leverage both latent and explicit knowledge for downstream tasks, providing interpretability, reducing hallucinations, and compensating for data or supervision scarcity. Approaches to GKP vary from unsupervised free-form generation to symbolic path extraction from knowledge graphs, and span modalities from text to vision. The following sections present a comprehensive technical synthesis of GKP, with exemplars drawn from recent advances across QA, dialogue, information extraction, and multimodal knowledge grounding.
1. Foundations and Motivation
Generated Knowledge Prompting originated to address the limitations of both static knowledge retrieval (e.g., fixed knowledge bases with closed schemas) and direct sequence-to-sequence modeling by LLMs. The essential insight is to leverage LLMs’ latent parametric knowledge—and/or selected fragments from structured resources—by generating, rather than retrieving, knowledge that is tailored to each query instance. In paradigmatic settings such as unsupervised commonsense question answering (QA), no labeled QA pairs are available, and knowledge type is arbitrary. GKP explicitly elicits intermediate reasoning steps, drawing on the model’s world knowledge and making the reasoning path transparent, as exemplified by the Two-Stage Generative Prompting (TSGP) framework (Sun et al., 2022), early work on free-form knowledge generation for QA (Liu et al., 2021), and iterative knowledge graph construction (Carta et al., 2023).
Motivations for GKP include:
- Extracting implicit, unlabeled knowledge stored in model parameters.
- Bridging the gap between question and answer with explicit, interpretable reasoning artifacts.
- Overcoming the brittle generalization of static retrieval, fixed schemas, or resource-limited knowledge graphs.
- Enabling symbolic knowledge capture or graph construction at prompt-time (Çöplü et al., 1 Feb 2024, Zhang et al., 2023).
2. Core Methodologies and Pipeline Designs
Most GKP frameworks are built around one or more of the following procedural stages:
2.1 Knowledge Generation
In free-form GKP (Sun et al., 2022, Liu et al., 2021):
- A pre-trained LLM (PrLM, e.g., GPT-2 or GPT-3) is prompted with a natural language instruction and a small set of hand-crafted demonstrations.
- Prompts employ open-ended specifications (e.g., “Generate some knowledge about...”).
- The model samples a set of knowledge statements conditioned on the input question.
- Sampling uses kernel (nucleus) sampling (typical p=0.5, max-length=64), with deduplication and empty string filtering.
In symbolic GKP and KG prompting (Çöplü et al., 1 Feb 2024, Zhang et al., 2023, Carta et al., 2023):
- A prompt-to-triple procedure is implemented, where the LLM is instructed to extract (subject, predicate, object) triples from text conditioned on a user-supplied relation vocabulary or a sampled knowledge graph substructure.
- This may occur via zero-shot templates, few-shot demonstration blocks, or through reinforcement-learned path extraction from a KG.
2.2 Knowledge Selection and Relevance Scoring
To ensure that generated knowledge is not only fluent but also relevant:
- In TSGP (Sun et al., 2022), Pointwise Mutual Information (PMI) between the generated knowledge and the question , computed as
is used to select the knowledge statement most conditionally informative about .
- In QA with symbolic KGs (Zhang et al., 2023), RL agents score KG paths on the basis of (i) reaching the target concept, (ii) context-relatedness (cosine similarity to question embeddings), and (iii) path conciseness.
2.3 Prompt Construction for Downstream Tasks
Once knowledge is selected/generated, it is incorporated into task prompts:
- In generative QA (Sun et al., 2022, Liu et al., 2021), prompts combining the original question and the selected knowledge statement are used to generate pseudo-answers or directly select the answer choice.
- In symbolic tasks (Çöplü et al., 1 Feb 2024), prompt outputs are fixed-format triples or graphs, with explicit format constraints in the system message.
- Multi-armed bandit (MAB) selection of prompt templates is used in KnowGPT (Zhang et al., 2023), which tests different instantiations (triples, sentential, graph-description) and selects the best based on reward.
2.4 Pipeline Integration
Representative GKP pipelines (TSGP (Sun et al., 2022), Multi-Stage Dialogue (Liu et al., 2022)) typically chain multiple stages:
- Knowledge Generation: generated and scored, with top selected.
- Answer/Pseudo-Answer Generation: Possible answers are produced in free form, independent of specific answer choices.
- Semantic Scoring: Answer options are scored by semantic similarity (e.g., normalized softmax over cosine similarities of option and generated answer embeddings).
Pseudocode Example (TSGP (Sun et al., 2022)):
1 2 3 4 5 6 7 8 9 10 11 |
prompt1 = T_KG.format(q) K_q = sample_LM(prompt1, M=20, p=0.5) scores = {k: PMI_LM(k, q) for k in K_q} k_star = argmax(scores) prompt2 = T_AG.format(q, k_star) S_q = sample_LM(prompt2, n=500, p=0.9) for a_i in A: S(a_i | q, k_star) = mean over S_q of softmax_T(cos(h_s, h_{a_i})) answer = argmax_i S(a_i | q, k_star) |
3. Prompt Engineering and Representation
Prompt formats in GKP are highly task- and resource-dependent, ranging from hand-crafted natural language templates to soft continuous prompts.
- Free-form, open-class prompts afford unconstrained knowledge types (definitions, analogies, consequences), with 5–10 demonstrations commonly used. No explicit schema is imposed (Sun et al., 2022, Liu et al., 2021).
- Structured symbolic prompts demand explicit instruction, including output format and relation vocabulary, with clear guidance for cases with no applicable triple (Çöplü et al., 1 Feb 2024).
- Soft prompts (embeddings) model world knowledge as trainable tensors prepended to the input (per-entity soft prompts), without modifying the base LM's architecture (Santos et al., 2022).
- Visual GKP employs spatial masks and region-set prompts, filling the prompt with masked image features plus region-level context text to produce format-free knowledge statements (Cui et al., 2023).
- Zero-shot and iterative decomposition: Fully zero-shot task decomposition, explicit output formatting, and atomic instruction templates are advocated for robustness and scalability in knowledge graph construction (Carta et al., 2023).
4. Quantitative Performance and Empirical Insights
GKP leads to significant, consistent empirical gains across QA and knowledge-intensive tasks, measured in accuracy and human-centered metrics.
Representative Results
- On CommonsenseQA (CSQA), TSGP improves GPT2-XL accuracy from 32.3% (baseline) to 49.1% and OpenBookQA from 22.8% to 44.4%. Ablation shows both knowledge and answer generation are crucial (Sun et al., 2022).
- In GKP for QA (Liu et al., 2021), adding generated knowledge raises zero-shot T5-11B accuracy on NumerSense from 67.5% to 78.0%; QASC (UnifiedQA) from 76.7% to 80.3%.
- KnowGPT achieves OpenBookQA test accuracy of 92.4%, matching human-level performance (91.7%) and outperforming baseline ChatGPT (60.0%) (Zhang et al., 2023).
- Fine-tuned soft prompts acting as entity memory raise T5-Small EM from 0.1% to 4.3% (zero-shot) and from 32.9% to 54.1% (finetuned) on SimpleQuestions (Santos et al., 2022).
- Visual GKP (OpenVik) improves CLIP Recall@1 on COCO text→image retrieval from 36.16% to 40.55%, and situation recognition accuracy from 53.14% to 75.16% (Cui et al., 2023).
Qualitative Analyses
- Human annotators rate GKP selections as grammatical (91%), relevant (82%), and useful (64%) on CSQA (Sun et al., 2022).
- Knowledge generated by GKP often flips the model’s prediction from wrong to correct via explicit reasoning paths (Liu et al., 2021).
- Visual GKP outputs exhibit higher diversity and freshness than region captioning or scene graph baselines (Cui et al., 2023).
Table: Accuracy Comparison (selected settings, (Sun et al., 2022, Zhang et al., 2023))
| Method | CSQA (%) | OpenBookQA (%) | Human OpenBookQA (%) |
|---|---|---|---|
| Baseline GPT-2-XL | 32.3 | 22.8 | — |
| TSGP (GKP, GPT2-XL) | 49.1 | 44.4 | — |
| ChatGPT (zero-shot) | 73.5 | 60.0 | 91.7 |
| KnowGPT | 81.8 | 92.4 | 91.7 |
5. Extensions, Ablations, and Modalities
5.1 Symbolic and Structured Knowledge
Prompt-time symbolic knowledge capture targets on-the-fly KG extension by having LLMs emit user-specified triples at inference, updatable without re-training (Çöplü et al., 1 Feb 2024, Carta et al., 2023). Approaches include zero/few-shot prompting, fine-tuning with parameter-efficient QLoRA, and iterative LLM-based pipeline refinement for entity/relation resolution and schema induction.
5.2 Visual Knowledge Generation
OpenVik demonstrates GKP in the visual domain by segmenting images into relation-oriented regions, prompting a multimodal model to generate region-level free-form knowledge, and integrating data enhancement for rare knowledge types (Cui et al., 2023). This enables downstream fusion with text queries, improving vision–language reasoning.
5.3 Dialogue Systems
Multi-Stage Prompting for knowledgeable dialogue generation applies GKP by first eliciting knowledge sentences contextually relevant to dialogue history and then using those statements in the response generation stage (Liu et al., 2022). The approach achieves superior knowledgeability and engagement compared to retrieval-assisted and finetuned models.
5.4 Machine Learning Utility and Synthetic Data
Knowledge-Guided Prompting (KGP) explicitly injects symbolic/statistical/semantic domain knowledge into data-generation prompts to reduce reliance on in-context examples, observed to yield empirical scaling laws where dataset quality Q(n,K) ≈ a n{−α} + b K{−β} (Xu et al., 24 May 2025). The approach finds that semantic domain knowledge can offset the need for up to 90% of examples in synthetic tabular data generation.
6. Limitations and Open Challenges
While GKP consistently outperforms retrieval-only or end-to-end generation baselines, several limitations are noted:
- Generation fidelity: Knowledge hallucination or irrelevance arises when the generator model is poorly matched to the task or lacks domain data (Liu et al., 2021, Sun et al., 2022).
- Prompt design: Manual prompt construction remains pervasive; automatic or learned prompting is an open direction (Liu et al., 2021, Carta et al., 2023).
- Scalability: Prompt/few-shot block size and KG relation counts constrain zero/few-shot approaches (Çöplü et al., 1 Feb 2024).
- Structured knowledge integration: Balancing expressiveness and conciseness in KG path extraction and template selection is challenging; current methods (e.g., RL in KnowGPT (Zhang et al., 2023)) rely on simplified reward structures and limited prompt catalogues.
- Cross-domain robustness: Most visual and graph-based GKP studies report results on single-domain or synthetic corpora; broader cross-domain generalization remains relatively unexplored (Cui et al., 2023, Carta et al., 2023).
7. Future Directions and Research Opportunities
- Dynamic and hybrid pipelines: Combining retrieval-augmented generation, prompt-time knowledge confirmation, and verification modules (e.g., PiVe) may address both hallucination and fact-integration limits (Çöplü et al., 1 Feb 2024).
- Scaling and adaptability: Parameter-efficient continual learning to expand relation vocabularies or entity sets without retraining the core LM (Çöplü et al., 1 Feb 2024).
- Automated prompt and knowledge selection: Automating prompt synthesis, knowledge granularity adaptation, and example selection, especially in multimodal and cross-lingual settings.
- Modality expansion and multimodal grounding: Extending GKP to multi-hop and open-domain multimodal settings (image, text, audio), benefiting from format-free knowledge representations and compositional reasoning (Cui et al., 2023).
- Memory and retrieval models: Soft-prompt-based persistent memory architectures enabling rapid update and deletion of world facts, supporting continual world model adaptation (Santos et al., 2022).
GKP delineates a versatile set of methodologies uniting generative modeling, explicit knowledge injection, and semantic reasoning, with demonstrated empirical benefits across QA, dialogue, knowledge graph construction, and synthetic data domains (Sun et al., 2022, Liu et al., 2021, Zhang et al., 2023, Cui et al., 2023, Liu et al., 2022, Çöplü et al., 1 Feb 2024, Xu et al., 24 May 2025, Santos et al., 2022, Carta et al., 2023). Its ongoing development targets more robust, scalable, and automated knowledge conditioning of LLMs in diverse real-world settings.