LEGO-GraphRAG: Modular Graph-Based RAG

Updated 20 November 2025

LEGO-GraphRAG is a modular framework that decomposes graph-based retrieval-augmented generation into explicit, interchangeable modules.
It provides standardized module interfaces and cost models that enable fine-tuned control over reasoning quality, latency, and resource consumption.
The framework supports diverse knowledge reasoning tasks by enabling systematic exploration and customization of retrieval and generation processes.

LEGO-GraphRAG is a modular framework for Graph-based Retrieval-Augmented Generation (GraphRAG), designed to enable systematic decomposition, classification, and flexible recombination of components in knowledge graph–augmented LLM systems. By providing explicit module interfaces, cost models, and a catalog of instantiable algorithmic patterns, LEGO-GraphRAG formalizes the design space of GraphRAG systems, supporting tailored trade-offs between reasoning quality, runtime efficiency, and token/GPU cost for diverse knowledge reasoning tasks (Cao et al., 6 Nov 2024, Gao et al., 26 Jul 2024).

1. Modular Pipeline Structure and Formalization

LEGO-GraphRAG conceptualizes the GraphRAG workflow as a series of explicit modules with well-defined responsibilities and interfaces. The standard pipeline comprises:

Retrieval (R): Locating candidate subgraphs and paths in a knowledge graph $G = (V, E)$ relevant to a natural language query $q$ .
Graph Construction (C): Assembling the retrieved fragments (typically, reasoning paths) into an evidential subgraph.
Query Augmentation ( $\Phi$ ): Merging the original query $q$ with the constructed evidence graph $H$ into a textual prompt $q'$ .
LLM Generation: Running the LLM on the augmented prompt to produce the final answer $a$ .

The retrieval stage is further divided into three specialized sub-modules:

Subgraph Extraction (SE): $SE(G, \epsilon_q) \to g_q$ , where $\epsilon_q$ is the set of query-anchored entities and relations, and $g_q$ is an induced subgraph.
Path Filtering (PF): $PF(g_q, \epsilon_q, q) \to P = \{P_i\}$ , extracting candidate reasoning paths as sequences of triples.
Path Refinement (PR): $PR(P, q, \epsilon_q) \to \hat{P}$ , selecting the most relevant paths according to relevance functions or neural reranking.

Each module exposes clean interfaces, producing standardized intermediate representations that facilitate independent development, debugging, and replacement.

Pipeline Composition

The full process can be formalized as: $G, q \to \text{Retrieval} \, (R=PR \circ PF \circ SE) \to \hat{P} \to C(\hat{P})=H \to \Phi(q, H) = q' \to LLM(q') = a$ where each arrow represents a swappable module or operator with an explicit function signature (Cao et al., 6 Nov 2024).

2. Algorithmic Taxonomy by Module

LEGO-GraphRAG provides a systematic taxonomy for existing GraphRAG solutions by mapping them to the above modules and classifying their techniques according to whether they are structural/statistical or neural-network-based.

Module	Non-NN Structure	Non-NN Statistical	Small-Scale NN	Vanilla LLM	Fine-Tuned NN/LLM
SE	PPR, RWR, Random Walk	BM25, TF-IDF	ST, DPR	GPT/LLaMA	ST_FT / LLM_FT
PF	BFS, DFS, k-hop	BM25	ST, DPR + BS	GPT+BS	ST_FT+BS / LLM_FT
PR	—	BM25	ST, DPR	GPT	ST_FT / LLM_FT

Structure-based methods exploit explicit graph algorithms for coverage and efficiency.
Statistical methods introduce early relevance filtering but can reduce recall.
Neural retrieval leverages dense/specialized embedding models; fine-tuned variants provide domain-specific gains at higher cost.
LLM scoring is utilized in path selection and evidence reranking, with strong accuracy but notable computational overhead (Cao et al., 6 Nov 2024).

3. Pipeline Instantiation and Cost Trade-Offs

Each module can be instantiated independently, enabling designers to optimize for explicit targets: accuracy, latency, and resource consumption. Two cost metrics are emphasized:

$\text{cost}_{\text{token}} = \alpha\,|q'| + \beta\,|a|,\quad \text{cost}_{\text{time}} = \gamma\,(T_{\text{SE}} + T_{\text{PF}} + T_{\text{PR}}) + \delta\,T_{\text{generation}}$

where $|q'|$ is prompt length, $|a|$ output length, $T_*$ denotes per-module runtimes, and $\alpha, \beta, \gamma, \delta$ calibrate actual costs.

Practical instantiation examples:

Latency/Scale-Oriented: PPR + BFS + BM25 + 0-shot LLM for minimal cost, moderate accuracy, and rapid response. No fine-tuning required.
Accuracy-Oriented: Fine-tuned LLMs for SE/PF/PR, beam search, multi-shot prompting, sacrificing latency and token cost for optimal accuracy.

The optimal pipeline is determined by sampling and measuring (cost_time, Accuracy) pairs subject to application-level constraints (Cao et al., 6 Nov 2024).

4. Empirical Findings, Benchmarks, and Best Practices

Extensive experiments on real-world KBQA datasets (CWQ, WebQSP, GrailQA, WebQuestions) reveal the impact of pipeline decisions:

SE Module: PPR maintains high recall, vital for downstream answerability; BM25 filtering is too lossy. ST/DPR rerankers on top of PPR optimize F1/recall with negligible time increase.
PF Module: Classical path-finding (BFS/DFS) provides strong baselines; beam search with specialized ST/DPR models further boosts final F1 at acceptable time overhead.
PR Module: Top- $k$ reranking with small ST/DPR models yields effective path culling; full LLM scoring is high-cost and brittle.
Generation: Short, linearized path prompts and mid-sized LLMs (zero- or one-shot) balance performance and cost.

Typical end-to-end accuracy:

PPR→ST→ST_FT for SE/PF/PR + 0-shot LLaMA2: 55–65% accuracy, retrieval time ~1s, generation time ~0.5s.
All-FT-LLM pipeline + 5-shot GPT: up to 75% accuracy but >10s per query and high token/GPU cost.

Best practice guidelines:

Default to PPR (SE) for recall.
Employ fast, specialized ST/DPR rerankers rather than LLMs in retrieval.
Restrict LLM generations to concise, evidence-augmented prompts.
Profile module-wise resource consumption to match workload SLAs (Cao et al., 6 Nov 2024).

5. Generalization to Broader Modular RAG

LEGO-GraphRAG is situated within a larger context of modular RAG system architectures. The abstraction hierarchy defines:

Modules (L1): Indexing, Pre-retrieval, Retrieval, Post-retrieval, Generation, Orchestration.
Sub-modules (L2): Query expansion, rewriting, sparse/dense retrieval, reranking, fusion, routing.
Operators (L3): Atomic transforms (BM25, BERT, LLM, Cypher execution, prompt engineering).

A Modular RAG system is a directed computational graph: $\mathcal{G} = \{q, D,\, \mathcal{M}, \{\mathcal{M}_s\}, \{Op\}\}$ with RAG flows as ordered (possibly branched or looping) graphs: $\mathcal{F} = (M_{\phi_1}, M_{\phi_2}, ..., M_{\phi_n})$

Canonical flow patterns—linear, conditional, branching, looping—are instantiated using this modular grammar. Advanced operators such as dynamic routing, iterative refinement, and graph-injection naturally fit as plug-and-play modules (Gao et al., 26 Jul 2024).

6. Theoretical Foundations and Future Directions

The framework’s modularity is grounded in principles from software engineering (single-responsibility, dependency injection) and functional programming (operator composition). Key consequences are:

Compositional expressivity: Any directed acyclic graph (DAG) of modules/operators is admissible, supporting arbitrary workflow patterns.
Proof modularity: Correctness and performance of each operator or module can be independently validated.

Identified research directions include:

Cost-aware scheduling: Adaptive selection of module configurations via cost functions $C(M_i)$ to meet latency/accuracy budgets.
Meta-learned orchestration: Policy learning for optimal flow selection per query.
LLM-graph coupling: Direct generation and use of subgraphs on-the-fly via operator integration (Gao et al., 26 Jul 2024).

A plausible implication is further convergence between database-centric reasoning, graph learning, and neural sequence generation—enabled by formalized, modular GraphRAG architectures.

7. Practical Deployment and Application Integration

LEGO-GraphRAG is deployed with modular selection of retrieval and generation components, using:

Classical vector DBs (Faiss, Weaviate) for embedding storage.
On-prem or API-accessible LLMs as generators.
Hybrid sparse-dense retrieval with explicit Python-based orchestration scripts.
Domain-specific configurations for tasks such as legal QA (KG-index, LLM fusion, model verification), e-commerce search (query rewrite, sparse retrieval), and open-domain QA (ensemble weighting, retrieval fine-tuning).

Compared to monolithic RAG solutions, this modularization reduces engineering overhead, simplifies “what-if” analyses through module/operator swapping, and enables rapid prototyping of new designs for domain- or workload-specific requirements (Gao et al., 26 Jul 2024, Cao et al., 6 Nov 2024).

Key References:

"LEGO-GraphRAG: Modularizing Graph-based Retrieval-Augmented Generation for Design Space Exploration" (Cao et al., 6 Nov 2024)
"Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks" (Gao et al., 26 Jul 2024)