Papers
Topics
Authors
Recent
Search
2000 character limit reached

Graph RAG-Tool Fusion Algorithm

Updated 8 June 2026
  • The paper presents a novel algorithm that integrates semantic vector search with graph-based dependency propagation to enhance toolchain retrieval in RAG systems.
  • It employs a structured tool knowledge graph with explicit dependency labels to model direct and indirect inter-tool relationships for precise reasoning.
  • Empirical evaluations on benchmark datasets demonstrate significant mAP@10 improvements, validating the approach's robust scalability and accuracy.

A Graph RAG-Tool Fusion algorithm integrates semantic vector search with structured dependency propagation over a tool knowledge graph, enabling retrieval-augmented generation (RAG) agents to select toolchains that accurately reflect complex requirements and inter-tool dependencies. This paradigm advances over naïve vector search approaches by directly encoding tool relationships as a graph and algorithmically traversing both semantic and dependency spaces for retrieval, as defined in recent research on FastInsight (An et al., 26 Jan 2026) and "Graph RAG-Tool Fusion" (Lumer et al., 11 Feb 2025).

1. Formalization of the Tool Knowledge Graph

The tool knowledge graph is represented as G=(V,E)G = (V, E), where VV is the set of tools (nodes) and EV×V×LE \subseteq V \times V \times L is the set of directed, labeled edges. The label set LL captures dependency types, including "tool_directly_depends", "tool_indirectly_depends", "param_directly_depends", and "param_indirectly_depends". Each node vVv \in V encodes metadata: a textual description desc(v)\mathrm{desc}(v), a dense vector embedding emb(v)Rd\mathrm{emb}(v) \in \mathbb{R}^d, and a flag core{0,1}core \in \{0,1\} marking core tools.

Edges model concrete structural or parameter dependencies, such that for e=(uv,)Ee = (u \rightarrow v, \ell) \in E, tool uu depends on VV0 by dependency of type VV1. This formalism enables precise graph-based reasoning over tool selection and invocation.

2. Algorithmic Structure and Retrieval-Scoring Formulations

Graph RAG-Tool Fusion decomposes retrieval into interleaved vector-based search and graph-based dependency propagation:

  • Vector Search: Given a user query VV2, compute query embedding VV3 and define the cosine-similarity score:

VV4

Retrieve the top-VV5 tools VV6.

  • Dependency Propagation: For each of the VV7 seeds VV8, traverse its dependencies in VV9 up to depth EV×V×LE \subseteq V \times V \times L0. For each node EV×V×LE \subseteq V \times V \times L1 encountered,

EV×V×LE \subseteq V \times V \times L2

where EV×V×LE \subseteq V \times V \times L3 is the shortest directed path from EV×V×LE \subseteq V \times V \times L4 to EV×V×LE \subseteq V \times V \times L5 (capped at EV×V×LE \subseteq V \times V \times L6), and EV×V×LE \subseteq V \times V \times L7 is an optional depth-decay. All nodes are then ranked primarily by EV×V×LE \subseteq V \times V \times L8, breaking ties by proximity in the graph and initial seed rank.

This construct yields a retrieval set that reflects both the direct semantic fit and the transitive dependencies among candidate tools, ensuring holistic coverage for complex tool-calling LLM workflows.

3. End-to-End Algorithm and Pseudocode

The canonical retrieval cycle comprises the following main steps (Lumer et al., 11 Feb 2025):

Step Operation Description
1 Vector retrieval Compute EV×V×LE \subseteq V \times V \times L9 over LL0 and select top-LL1 LL2
2 Graph traversal For each LL3, DFS over LL4 up to depth LL5; collect all reachable LL6 not already in the result set
3 Score propagation Assign LL7 as above for all LL8 discovered
4 Ordering & truncation Sort all LL9 by vVv \in V0, then by increasing distance and original rank; output top vVv \in V1

Pseudocode formalizes this as: (1) embed query and retrieve vector matches; (2) initialize graph score list; (3) for each seed tool, propagate through the dependency graph; (4) order by propagated score with distance/rank tie-breaking; (5) truncate to vVv \in V2 results.

4. Empirical Evaluation and Benchmarks

Performance is substantiated on ToolLinkOS, a benchmark of 573 fictional tools (average 6.3 dependencies per tool) and 1,569 user queries. Main findings:

  • On ToolLinkOS (vVv \in V3): naïve RAG achieves mAP@10 = 0.210, while Graph RAG-Tool Fusion with vVv \in V4 and initial-vector reranking yields mAP@10 = 0.927—an absolute gain of 71.7%.
  • On ToolSandbox (33 tools, 1,032 queries): naïve RAG mAP@10 = 0.440; Graph RAG-Tool Fusion with reranking achieves 0.661 (+22.1% absolute).
  • Ablation shows reranking the top-vVv \in V5 adds 7–14% absolute mAP@10 over non-reranked variants, by prioritizing correct seed selection and mitigating truncation errors.

These results demonstrate superior recall and precision, especially for queries involving tools with complex, nested dependencies, relative to baseline vector-only RAG.

5. Complexity and Scalability Analysis

Let vVv \in V6 (total tools), vVv \in V7 (seed count), vVv \in V8 (dependency depth cutoff), and vVv \in V9 (average node out-degree). Main complexity factors:

  • Vector search costs desc(v)\mathrm{desc}(v)0 (or desc(v)\mathrm{desc}(v)1 per query with HNSW-like indices).
  • Graph traversal for dependencies is desc(v)\mathrm{desc}(v)2.
  • Sorting and truncation for up to desc(v)\mathrm{desc}(v)3 elements is desc(v)\mathrm{desc}(v)4.

Practical deployments exploit sublinear vector search and bounded dependency expansions (average 6.3 dependencies/tool), enabling efficient scaling to large toolbases. A plausible implication is that further increases in toolbase size (with bounded average degree) show only modest increases in end-to-end retrieval latency.

6. Implementation Considerations

  • Embeddings and vector DB: Deployed using Azure OpenAI text-embedding-ada-002, with HNSW approximate vector search (HNSW parameters desc(v)\mathrm{desc}(v)5, hybrid weight desc(v)\mathrm{desc}(v)6 default).
  • Graph Storage: Neo4j DB with typed adjacency lists and structured edge metadata, ensuring efficient multi-type traversal.
  • Parameter Settings: Default desc(v)\mathrm{desc}(v)7 seeds; expansion limited to full direct/indirect dependencies unless otherwise specified; final output list size desc(v)\mathrm{desc}(v)8 for mAP@10 reporting.
  • Query Reranking: Optional GPT-4O LLM reranker applied to top candidate seeds; prompts conform to Pydantic-type tool schemas.
  • Robustness: Tool and schema co-design (manual+LLM) assures consistency and name collision avoidance; tools encoded as JSON nodes with explicit dependency tuples.
  • Scalability: Empirical results report strong scaling to thousands of tools, with bounded graph-expansion overhead due to low average degree.

7. Extensions and Connections to Corpus Graph Retrieval

The fusion methodology in Graph RAG-Tool Fusion maps directly to recent advances in corpus-graph RAG, particularly the introduction of two fusion operators in FastInsight (An et al., 26 Jan 2026):

  • GRanker (Graph Model-based Search): Injects neighborhood context into node rankings using Laplacian smoothing over latent cross-encoder representations, addressing the "topology-blindness" of standard model-based search.
  • STeX (Semantic-Topological eXpansion): Expands the retrieval frontier by jointly scoring candidates on both semantic vector and structural graph criteria, remedying semantics-blindness in pure graph traversal.

A formal extension to full Graph RAG–Tool Fusion incorporates an external tool invocation operator (desc(v)\mathrm{desc}(v)9), integrating outputs as "pseudo-nodes" into the graph and re-running GRanker/STeX over this enriched subgraph. This enables adaptive, context-driven tool selection with temporary augmentation by LLM-generated or externally queried content, fusing semantic, structural, and tool-external evidence in the retrieval loop (An et al., 26 Jan 2026).

Summary Table: Main Empirical Results

Dataset Naïve RAG mAP@10 Graph RAG-Tool Fusion mAP@10 Absolute Gain
ToolLinkOS 0.210 0.856 (+rerank: 0.927) +64.6% (+71.7%)
ToolSandbox 0.440 0.521 (+rerank: 0.661) +8.1% (+22.1%)

The Graph RAG-Tool Fusion algorithm achieves tight integration of vector search and graph traversal, enabling structure-aware toolchain retrieval far superior to baseline RAG methods and facilitating effective, scalable, and context-sensitive tool orchestration (Lumer et al., 11 Feb 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Graph RAG-Tool Fusion Algorithm.