LogicRAG Framework
- LogicRAG is a dynamic framework that decomposes complex queries into minimal subproblems and constructs a directed acyclic graph for adaptive reasoning.
- It employs on-the-fly query decomposition and topological sorting to efficiently orchestrate multi-step retrieval and answer generation.
- Graph and context pruning strategies in LogicRAG reduce token usage by up to 70%, significantly improving accuracy and resource efficiency in multi-hop tasks.
The LogicRAG framework is a logic-aware retrieval-augmented generation system for LLMs that dispenses with pre-built graphs. Instead, it dynamically decomposes complex queries into subproblems, determines their dependencies, and orchestrates adaptive, efficient knowledge retrieval and multi-step reasoning entirely at inference time. LogicRAG is designed to improve both accuracy and resource efficiency in complex question answering settings, especially those requiring multi-hop reasoning, by modeling the query’s latent logic structure with a dynamically built directed acyclic graph (DAG) rather than a static corpus-wide graph.
1. Motivation and Problem Definition
LLMs are susceptible to hallucination—generating factually incorrect responses—when confronted with queries outside their training distribution or knowledge scope. Retrieval-augmented generation (RAG) mitigates this by grounding LLMs with relevant passages from external corpora: where is a query, is a corpus, a retrieval function, and the LLM.
Graph-based RAG (GraphRAG) methods have leveraged offline-constructed knowledge graphs for retrieval, showing improvements on complex multi-hop questions. However, such approaches incur heavy preprocessing costs—requiring transformation of the entire corpus into a graph, consuming thousands of tokens and many minutes even for moderate corpora. Furthermore, static graphs are query-agnostic; their edges and structure may not fit the logical requirements of individual queries, leading to misaligned retrieval, inefficiency, and update latency.
LogicRAG addresses these limitations by dynamically discovering a problem-specific reasoning structure at inference time, decomposing the query, extracting a dependency DAG, and controlling retrieval generation adaptively without the need for a pre-built graph.
2. Dynamic Query Decomposition and DAG Construction
Central to LogicRAG is on-the-fly query decomposition. An LLM-based decomposition function
segments into minimal, non-overlapping subproblems that together cover all required knowledge. Decomposition is operationalized with few-shot prompting, e.g., “Segment the question into minimal reasoning steps.” Completeness and non-overlap are enforced.
LogicRAG then induces a directed acyclic graph
where edges are present if depends on the answer to . Edges are inferred by prompting the LLM on stepwise dependencies, and a DFS check ensures acyclicity. This process encodes the latent reasoning pathway optimal for each query instance.
3. Retrieval and Reasoning Scheduling via Graph Linearization
To guide execution, LogicRAG linearizes the DAG using topological sort: ensuring dependencies are resolved in order. This sequence allows the logic-respecting scheduling of retrieval and subproblem answering. The topological sort is implemented via DFS in time, as shown below:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
def TopoSort(G): visited = set() stack = [] def DFS(v): visited.add(v) for u in Neighbors(v): if u not in visited: DFS(u) stack.append(v) for v in V: if v not in visited: DFS(v) return reversed(stack) |
4. Pruning Mechanisms for Efficiency
LogicRAG implements two pruning strategies to optimize both accuracy and resource efficiency:
- Graph Pruning: At each topological level , a sibling set of subproblems is scored for semantic similarity. If similarity , subproblems are merged into a unified query via the LLM, effectively reducing redundant retrieval.
- Context Pruning: As retrieval progresses, a rolling memory accumulates the most salient retrieved information. It is updated by summarization: Any retrieved passage with score is dropped. This filtering limits token bloat and maintains a high signal-to-noise ratio for downstream reasoning.
5. Adaptive Retrieval and Generation Pipeline
For each (possibly merged) subproblem , LogicRAG retrieves the top- passages: using cosine similarity between query and document embeddings.
The reasoning pipeline operates as a forward pass across sorted DAG levels. For each :
- Construct by merging subproblems.
- Retrieve .
- Summarize to update .
- Prompt the LLM for each with the rolling memory:
If novel subproblems are articulated by the LLM, these are dynamically added to and processed recursively, ensuring completeness.1 2 3
"Given rolling memory Mem^(r), answer subproblem p_i: [p_i] Context: [Mem^(r)]"
6. Experimental Results, Efficiency, and Illustrative Example
LogicRAG was evaluated on HotpotQA (2-hop), 2WikiMQA (2–4 hops), and MuSiQue (composed single-hop) datasets against baselines including vanilla RAG (various ), zero-shot LLMs, and state-of-the-art GraphRAG-style models (KGP, RAPTOR, GraphRAG, LightRAG, HippoRAG, HippoRAG2). Key findings:
- On 2WikiMQA, string-match accuracy increased from 50.0% to 64.7% (+14.7 pp over the best baseline).
- Average token consumption on 2WikiMQA was reduced to 1.8K versus 2.8–4.7K for GraphRAG variants.
- Latency per question decreased to 9.8 seconds, compared to 13–35 seconds for graph-based methods.
- Combined pruning mechanisms (graph and context) reduced token usage per query by 60–70%, without significant impact on accuracy for retrieval (with further increases offering minimal gains but linearly growing resource usage).
A three-step illustrative example is as follows. For the question, “What month did Tripartite discussions begin between Britain, France, and the country…?”:
- Decomposition: (identify country), (decode “nobilities commonwealth”), (find month).
- DAG construction: , reflecting dependency chains.
- Topological sort: .
- Iterative reasoning:
- , : “What historic entity does ‘nobilities commonwealth’ refer to?” (Answer: "Polish–Lithuanian Commonwealth")
- , : “Given Mem, what country did Warsaw Pact leadership originate from?” (Answer: "the Soviet Union")
- , : “When (month) did Tripartite discussions (Britain, France, Soviet Union) begin?” (Answer: "June")
- Context pruning distills 600 tokens of retrieved content per round to 150 core tokens, while graph pruning was not needed in this instance.
7. Significance and Context within Retrieval-Augmented Reasoning
LogicRAG demonstrates that adaptive, inference-stage logic modeling outperforms prior static GraphRAG approaches both in answer quality and efficiency. By eschewing any global, pre-computed graph, it minimizes preprocessing overhead, reduces per-query resource consumption, and enables dynamic alignment between retrieval structure and query logic. The methodology also underscores the greater generality and extensibility of logic-structured retrieval—facilitating multi-step, dependency-aware reasoning for arbitrary queries encountered at inference time. Extensive benchmarks show that dynamic logic-aware retrieval both outperforms prior pre-built graph baselines and achieves 30–60% savings in token and latency costs for complex multi-hop question answering tasks (Chen et al., 8 Aug 2025).