LEGO-GraphRAG: Modular GraphRAG Framework

Updated 31 December 2025

The paper introduces LEGO-GraphRAG, a modular framework that decomposes graph-based retrieval into six distinct stages for enhanced tuning and evaluation.
It presents a systematic taxonomy contrasting non-NN and NN-based methods, detailing trade-offs in precision, latency, and computational cost using concrete metrics.
Empirical evaluations on multiple KBQA datasets demonstrate that tailored subgraph extraction and path refinement significantly improve F1 scores while managing resource constraints.

LEGO-GraphRAG is a modular framework for Graph-based Retrieval-Augmented Generation (GraphRAG), which integrates knowledge graphs with LLMs to enhance reasoning accuracy and contextual relevance. Designed to overcome limitations in current GraphRAG research—namely, the lack of fine-grained workflow decomposition and the absence of systematic classification and empirical design space exploration—LEGO-GraphRAG enables precise analysis and construction of advanced graph-augmented LLM systems (Cao et al., 2024).

1. Rationale and Architectural Overview

Traditional Retrieval-Augmented Generation (RAG) pipelines typically retrieve entire documents, leading to significant noise and redundancy. In contrast, GraphRAG replaces these documents with knowledge graphs and retrieves compact, precise “reasoning paths” that directly connect query entities and candidate answers. However, prior work has treated graph retrieval as a monolithic operation, impeding algorithmic comparison and granular performance tuning. LEGO-GraphRAG addresses three pivotal challenges: (1) the absence of a unified, fine-grained framework for decomposing graph-based retrieval, (2) no systematic categorization or evaluation of existing techniques, and (3) difficulty quantifying the impact of individual stages such as graph search or re-ranking.

The LEGO-GraphRAG pipeline is modularized into six distinct steps:

Query Analysis: Entity and relation extraction from raw questions ( $Q$ ).
Subgraph Retrieval (SE): Extracting a query-relevant subgraph $g_Q$ from the full knowledge graph $G$ .
Reasoning-Path Retrieval (PF): Searching $g_Q$ for reasoning paths $P$ linking entities to candidate answers.
Path Refinement (PR): Filtering and scoring paths to obtain a high-precision set $\hat{S}$ .
Prompt Assembly: Concatenating a template, representations of $\hat{S}$ , and the query into an augmented prompt $P'$ .
LLM Generation: Feeding $P'$ to an LLM to generate the final answer $A$ .

2. Systematic Taxonomy of Retrieval Techniques

LEGO-GraphRAG introduces a comprehensive classification scheme for subgraph extraction (SE), path filtering (PF), and path refinement (PR) modules along two dimensions: model type (Non-NN vs. NN-based) and coupling to the graph (non-coupled vs. coupled specialization), with explicit cost categories.

Retrieval Stage	Non-NN Models	NN-based Models
Subgraph Extraction	PPR, RWR, Ego-network	Fine-tuned Transformer (specialized), LLM (fine-tuned/generic)
Path Filtering	BFS, DFS, Dijkstra	Sentence-Transformer, DPR, generic/specialized rerankers, LLM scorer
Path Refinement	BM25, random, n-gram, LSA/LDA	Sentence-Transformer, reranker, GPT/LLM (fine-tuned, coupled)

Non-NN models include structure-based techniques (e.g. Personalized PageRank (PPR), Random Walk with Restart (RWR), Ego-network, BFS/DFS, Dijkstra) and statistical scoring (BM25, n-gram, LSA, LDA).
NN-based models comprise small-scale general models (Sentence-Transformers, DPR), vanilla LLMs (Llama, Qwen, GPT as unspecialized scorers or path-generators), small-scale specialized models (fine-tuned on specific KGs), and high-cost fine-tuned LLMs (trained on KG QA data).

Example taxonomy mappings:

Subgraph-extraction: "PPR + BM25" denotes applying PPR followed by BM25-based filtering.
Path-filtering: "Beam search + Sentence-Transformer" indicates beam search guided by Sentence-Transformer scores.
Path-refinement: "Fine-tuned GPT scorer" utilizes a large, KG-specific fine-tuned LLM.

3. Formalization and Mathematical Foundations

The workflow is formalized using distinct notation and modular composition:

$Q$ : input query (text)
$G=(V,E)$ : directed labeled knowledge graph
Retrieval: $S = R(Q,G) = \arg\max_{S\subseteq G} \operatorname{Score}(Q, S)$
Fusion/Prompt Assembly: $P = F(S,Q) = \operatorname{concat}(\text{Template}, \operatorname{repr}(S), Q)$
LLM Generation: $A = GLLM(P)$

Decomposition of $\operatorname{Score}(Q,S)$ across SE, PF, PR steps allows explicit optimization via $\arg\max$ or beam search at each modular stage. This separates concerns between subgraph recall, path precision, and final answer quality.

4. Empirical Evaluation and Performance Analysis

A comprehensive evaluation across 21 pipeline instances is performed on four knowledge base question answering (KBQA) datasets (CWQ, WebQSP, GrailQA, WebQuestions; >26,000 train and >7,000 test queries in aggregate).

Metrics utilized include:

Subgraph/path retrieval: Precision, Recall, $\text{F}_1$ over answer entities/paths; Hit Ratio (HR: fraction of queries retrieving ≥1 correct path)
LLM answer generation: Exact-match accuracy (HR@1), $\text{F}_1$ of generated answers
Module runtime: Measured in seconds per query per stage (e.g. subgraph extraction, path filtering, path refinement)
Token/GPU cost: Fine-tuning time (LoRA for ST vs. LLMs), LLM inference latency

Findings:

Subgraph-extraction: PPR attains recall >90% but moderate precision; RWR is faster (<0.03s vs. 0.84s for PPR) at reduced recall. BM25/statistical filters have negligible time cost but degrade recall. Sentence-Transformer/rerankers yield precision gains (+0.1s cost); fine-tuned LLMs maximize $\text{F}_1$ (~5-6s/query overhead).
Path-filtering: BFS/DFS/shortest-path algorithms provide comparable $\text{F}_1$ at <0.1s. Beam search guided by specialized small-scale models achieves optimal precision/recall (0.4-0.7s/query). LLM-based scoring in beam search is costly (>5s) and struggles with prompt coverage.
Path-refinement: Non-NN random/BM25 methods produce low $\text{F}_1$ . NN-based rerankers suppress noise with minimal overhead (0.1-0.7s/query); LLM-based re-scoring yields highest precision, with added 3–5s latency.
End-to-end trade-offs: "Basic" PPR → SPF → random pipeline runs at ~0.8s/query, achieves $\text{F}_1 \sim 40\%$ . Adding Sentence-Transformer to SE and PF increases $\text{F}_1$ by +10 pts with +1s total cost. The most accurate setup (fine-tuned LLM in SE+PF+PR) reaches $\text{F}_1 \sim 75\%$ but incurs ~12s/query and significant fine-tuning cost.
Generation sensitivity: LLM answer quality improves as more reasoning paths are provided (up to ~64). Increasing the number of prompt shots (zero/one/few-shot) provides up to +2–3 pts gain in $\text{F}_1$ , depending on model.

5. Practical Recommendations and Design Guidelines

Derived from systematic analysis:

Subgraph extraction should favor high recall; PPR is recommended as default. For moderate resource budgets, augment PPR with small-scale rerankers for improved precision.
Path-filtering: Beam search with specialized NN-based rerankers balances precision and latency. Full enumeration is appropriate only for restricted graph sizes.
Path-refinement: Employ lightweight NN models such as Sentence-Transformer for initial pruning, reserving full LLM re-scoring for maximum-accuracy applications.
To control prompt length without sacrificing answer quality, cap the number of reasoning paths (top_k=64) and extracted entities (max_ent=2000).
Tune beam width in iterative path search (suggested beam_width=128) for cost-effective exploration.
In most scenarios, zero-shot or one-shot prompting suffices for established LLMs; additional few-shot examples yield diminishing returns beyond 2–3 cases.

6. Contributions and Significance

LEGO-GraphRAG advances the state of the art in GraphRAG by explicitly modularizing retrieval into SE, PF, and PR components, enabling granular comparative analysis and system design. Its taxonomy juxtaposes non-NN and NN-based solutions along computational cost and graph coupling axes, facilitating systematic benchmarking and configuration. The framework provides clear, formal notation and modular LaTeX-style formulas for end-to-end composition, and its empirical study delivers actionable recommendations for balancing reasoning fidelity, efficiency, and computational cost in graph-augmented LLM systems (Cao et al., 2024).

A plausible implication is that future research may leverage LEGO-GraphRAG’s modularity to conduct targeted ablation studies, tune resource allocations for use-case-specific latency–accuracy trade-offs, and extend the taxonomy as novel graph-based retrieval and augmentation techniques emerge.

PDF Markdown Chat (Pro)

References (1)

LEGO-GraphRAG: Modularizing Graph-based Retrieval-Augmented Generation for Design Space Exploration (2024)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to LEGO-GraphRAG.

LEGO-GraphRAG: Modular GraphRAG Framework

1. Rationale and Architectural Overview

2. Systematic Taxonomy of Retrieval Techniques

3. Formalization and Mathematical Foundations

4. Empirical Evaluation and Performance Analysis

5. Practical Recommendations and Design Guidelines

6. Contributions and Significance

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

LEGO-GraphRAG: Modular GraphRAG Framework

1. Rationale and Architectural Overview

2. Systematic Taxonomy of Retrieval Techniques

3. Formalization and Mathematical Foundations

4. Empirical Evaluation and Performance Analysis

5. Practical Recommendations and Design Guidelines

6. Contributions and Significance

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research