BYOKG-RAG: Custom KG QA Framework

Updated 13 March 2026

BYOKG-RAG is a framework for question answering over custom knowledge graphs that combines LLM-driven artifact generation with specialized graph retrieval.
It employs an iterative two-stage process integrating artifact generation and downstream graph operations to address schema mismatches and entity linking challenges.
Experimental evaluations show improved performance across diverse benchmarks, underscoring its effective multi-strategy retrieval and generalization capabilities.

BYOKG-RAG (Bring-Your-Own-Knowledge-Graph Retrieval-Augmented Generation) is a general framework for question answering (QA) over custom or domain-specific knowledge graphs (KGs), using a synergy of LLMs and multi-strategy graph retrieval. It addresses core challenges in knowledge graph QA—heterogeneous schemas, unreliable entity linking, and limited generalization—by interleaving LLM-driven artifact generation and specialized, tool-mediated graph retrieval in an iterative pipeline. Unlike traditional RAG models that rely solely on unstructured text or assume a fixed KG schema, BYOKG-RAG generalizes to arbitrary, user-supplied KGs and supports complex reasoning by decomposing retrieval into multiple complementary phases (Mavromatis et al., 5 Jul 2025).

1. Motivation and Problem Setting

The primary target of BYOKG-RAG is open-domain and domain-specific KGQA, where the input is a natural-language question $q$ and a user-provided knowledge graph $G$ of triples $T = \{(h, r, t)\}$ ; the system is tasked with returning the correct answer entities $A \subseteq V$ (nodes of $G$ ) (Mavromatis et al., 5 Jul 2025). Existing approaches typically encounter:

Schema/surface form mismatches (varying node/edge types, aliases)
Inadequate generalization to custom KGs or previously unseen ontologies
Agentic LLM traversals prone to entity linking errors and poor multistep compositional reasoning

BYOKG-RAG aims to overcome these limitations by directly leveraging both LLM reasoning and specialized graph operations, iteratively exchanging high-level reasoning artifacts between the LLM and downstream KG tools.

2. System Architecture and Iterative Pipeline

BYOKG-RAG is built around a two-stage loop that continues until convergence:

Stage I: KG-Linker (LLM Generation of Graph Artifacts)

From prompt $(q, S, C^{t-1})$ $(q, S, C^{t - 1})$ , where $S$ $S$ is the KG schema and $C^{t-1}$ $C^{t - 1}$ the prior context, the LLM produces:
- Extracted question entities ( $\tilde{E}_q$ )
- Candidate answer mentions ( $\tilde{E}_a$ )
- Reasoning/relation paths ( $G$ 0)
- Executable OpenCypher queries ( $G$ 1)
- Draft answers ( $G$ 2)

Stage II: Graph Retrieval Toolkit

Specialized modules resolve LLM outputs to yield relevant KG contexts:
- Agentic mode: One-hop expansions with iterative LLM filtering for relation/edge relevance.
- Scoring mode: Retrieves top- $G$ 3 KG triples by semantic similarity: $G$ 4.

The union of all retrieved results forms a new context $G$ 5; it is added to the LLM input for the next round. The process self-terminates upon stabilization or after $G$ 6 rounds. The final output is computed as $G$ 7 (Mavromatis et al., 5 Jul 2025).

3. Key Algorithms and Formalism

Formally, BYOKG-RAG uses the following operations per iteration:

Entity Linking: For each $G$ 8 from the LLM, obtain $G$ 9 nodes according to both string and embedding similarity.
Path/Query Retrieval: For each proposed path or query, retrieve corresponding subgraphs/facts from $T = \{(h, r, t)\}$ 0.
Triplet Retrieval: Either via agentic LLM one-hop expansion and filtering, or via direct scoring by composite semantic similarity.
Iteration: $T = \{(h, r, t)\}$ 1.

A crucial insight is to treat the LLM as a generator of diverse "hooks"—string entities, relation paths, executable queries—rather than a full-graph agent. Graph tools resolve these hooks, yielding a more robust and generalizable retrieval compared to agent-only or retriever-only KGQA systems (Mavromatis et al., 5 Jul 2025).

4. Experimental Evaluation

BYOKG-RAG has been evaluated on multiple zero- and few-shot KGQA benchmarks:

Dataset	Underlying KG	Retrieval Complexity	BYOKG-RAG Main Metric	2nd-Best
WebQSP-IH	Freebase	1–2-hop QA	Hit@1: 86.6%	86.2%
CWQ-IH	Freebase	up to 4-hop	Hit@1: 73.6%	69.3%
CronQuestions	Wikidata	Temporal multi-entity	Hit@1: 65.5%	59.8%
MedQA	DiseaseDrugBank	Medical domain	Hit@2: 65.0%	62.5%
Northwind	Enterprise	Aggregation/cypher	LLMaaJ: 64.9%	55.3%

Averaged over all tasks, BYOKG-RAG exceeds the strongest prior by 4.5 percentage points, while requiring no KG-specific supervision (Mavromatis et al., 5 Jul 2025).

Ablation experiments confirm that each retrieval component (agentic, path, scoring, query generation) provides unique contributions. The LLM-based linking step in particular improves performance over pure string/entity similarity by 6–8 points on compositional tasks. Second-pass refinement yields a further 5–7 point improvement for the most complex benchmarks (Mavromatis et al., 5 Jul 2025).

5. Generalization and Case Analysis

BYOKG-RAG is architecturally agnostic to the KG schema, requiring only schema introspection rather than curated alignment or task-specific templates. It adapts to diverse structural patterns (aggregation, temporal, domain-specialized) through its artifact generation and retrieval mix.

Notable qualitative cases include:

For CWQ-style compositional queries, BYOKG-RAG's first iteration proposes plausible relation chains; agentic retrieval then grounds the entities, and subsequent refinement revises the reasoning path to reach the correct target.
On temporal reasoning (CronQuestions), BYOKG-RAG leverages both path retrieval and query execution to resolve chronology, outperforming purely agentic traversals that tend to get trapped in local neighborhoods.
For enterprise-style aggregations (Text2Cypher), query execution is essential, highlighting the necessity of supporting programmatic KG operations.

6. Relationship to Other KG-Augmented RAG Systems

BYOKG-RAG shares high-level aims with other BYOKG-style approaches such as KG $T = \{(h, r, t)\}$ 2RAG (Zhu et al., 8 Feb 2025) and "KG-Infused RAG" (Wu et al., 11 Jun 2025). A comparison is presented below:

System	Main Focus	Retrieval Modes	LLM Role	Empirical Coverage
BYOKG-RAG (Mavromatis et al., 5 Jul 2025)	QA over custom KGs; robust, multi-strategy retrieval	Entity linking, multi-hop path, agentic, query, scoring	Artifact/hook generation; iterative	Freebase, Wikidata, MedQA, enterprise
KG $T = \{(h, r, t)\}$ 3RAG (Zhu et al., 8 Feb 2025)	Chunk expansion and organization via KG structure	Chunk ↔ KG linking, multi-hop expansion, graph-based paragraph organization	Entity/relation disambiguation, paragraph assembly	HotpotQA and variants
KG-Infused RAG (Wu et al., 11 Jun 2025)	Fusing unstructured and KG evidence; spreading activation	Dense text, KG spreading, combined ranking	Knowledge activation, query rewrite, answer generation	Multi-hop Wikipedia QA

A plausible implication is that BYOKG-RAG’s explicit decoupling of LLM-driven artifact generation and downstream graph retrieval enables greater generalization to arbitrary, user-supplied graphs and queries, as well as flexible integration with a range of graph operations.

7. Practical Considerations and Limitations

BYOKG-RAG does not require hand-labeled KGQA data or any graph-specific retriever fine-tuning, reducing onboarding friction for arbitrary domains. The framework's iterative design converges quickly (average ≈2 iterations), resulting in moderate inference overhead (about 2.4× a baseline LLM call), with substantially fewer LLM invocations than multi-step agentic baselines (Mavromatis et al., 5 Jul 2025).

However, trade-offs include additional complexity in integrating and orchestrating multiple retrieval modules, and potential error propagation if initial artifact extraction is misaligned with KG content. The system assumes availability of a graph schema and access to basic KG APIs (entity search, neighbor expansion, path queries, Cypher execution).

References

"BYOKG-RAG: Multi-Strategy Graph Retrieval for Knowledge Graph Question Answering" (Mavromatis et al., 5 Jul 2025)
"Knowledge Graph-Guided Retrieval Augmented Generation" (Zhu et al., 8 Feb 2025)
"KG-Infused RAG: Augmenting Corpus-Based RAG with External Knowledge Graphs" (Wu et al., 11 Jun 2025)

Markdown Report Issue Upgrade to Chat

References (3)

BYOKG-RAG: Multi-Strategy Graph Retrieval for Knowledge Graph Question Answering (2025)

Knowledge Graph-Guided Retrieval Augmented Generation (2025)

KG-Infused RAG: Augmenting Corpus-Based RAG with External Knowledge Graphs (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to BYOKG-RAG.

BYOKG-RAG: Custom KG QA Framework

1. Motivation and Problem Setting

2. System Architecture and Iterative Pipeline

3. Key Algorithms and Formalism

4. Experimental Evaluation

5. Generalization and Case Analysis

6. Relationship to Other KG-Augmented RAG Systems

7. Practical Considerations and Limitations

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

BYOKG-RAG: Custom KG QA Framework

1. Motivation and Problem Setting

2. System Architecture and Iterative Pipeline

3. Key Algorithms and Formalism

4. Experimental Evaluation

5. Generalization and Case Analysis

6. Relationship to Other KG-Augmented RAG Systems

7. Practical Considerations and Limitations

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research