Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
131 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Graph RAG-Tool Fusion (2502.07223v1)

Published 11 Feb 2025 in cs.CL

Abstract: Recent developments in retrieval-augmented generation (RAG) for selecting relevant tools from a tool knowledge base enable LLM agents to scale their complex tool calling capabilities to hundreds or thousands of external tools, APIs, or agents-as-tools. However, traditional RAG-based tool retrieval fails to capture structured dependencies between tools, limiting the retrieval accuracy of a retrieved tool's dependencies. For example, among a vector database of tools, a "get stock price" API requires a "stock ticker" parameter from a "get stock ticker" API, and both depend on OS-level internet connectivity tools. In this paper, we address this limitation by introducing Graph RAG-Tool Fusion, a novel plug-and-play approach that combines the strengths of vector-based retrieval with efficient graph traversal to capture all relevant tools (nodes) along with any nested dependencies (edges) within the predefined tool knowledge graph. We also present ToolLinkOS, a new tool selection benchmark of 573 fictional tools, spanning over 15 industries, each with an average of 6.3 tool dependencies. We demonstrate that Graph RAG-Tool Fusion achieves absolute improvements of 71.7% and 22.1% over na\"ive RAG on ToolLinkOS and ToolSandbox benchmarks, respectively (mAP@10). ToolLinkOS dataset is available at https://github.com/EliasLumer/Graph-RAG-Tool-Fusion-ToolLinkOS

Summary

  • The paper introduces Graph Retrieval-Augmented Generation-Tool Fusion (RAG-Tool Fusion), a novel plug-and-play method combining vector retrieval with knowledge graph traversal to improve LLM agent tool selection by modeling structured dependencies.
  • To evaluate tool selection with interdependencies, the authors created ToolLinkOS, a new benchmark dataset featuring 573 tools across 15 industries with detailed dependency structures.
  • Experimental results show Graph RAG-Tool Fusion significantly outperforms naive RAG baselines, achieving absolute improvements of 71.7% on ToolLinkOS and 21.1% on ToolSandbox by effectively integrating semantic and dependency information.

The paper introduces Graph Retrieval-Augmented Generation-Tool Fusion (RAG-Tool Fusion), a novel plug-and-play approach designed to enhance tool selection for LLM agents by integrating vector-based retrieval with knowledge graph traversal. The approach addresses the limitations of traditional Retrieval-Augmented Generation (RAG) methods in capturing structured dependencies between tools, which is crucial for complex tool interactions. The paper also introduces ToolLinkOS, a new benchmark dataset designed to evaluate tool selection in scenarios with interdependent tools.

Here's a breakdown of the key components and contributions:

  • Problem Statement: Traditional RAG approaches excel at retrieving tools based on unstructured, semantic relationships but often fail to capture structured dependencies. For example, a "restaurant_reservation" tool may require "get_current_location" and "get_current_datetime" tools. Existing benchmarks do not adequately address these dependencies.
  • Graph RAG-Tool Fusion: This approach combines vector-based retrieval with graph traversal to capture relevant tools (nodes) and their dependencies (edges) within a predefined tool knowledge graph. The process involves an initial vector search to retrieve top-k relevant tools, followed by graph traversal to retrieve each tool's dependencies, ultimately equipping the agent with a refined set of tools.
  • ToolLinkOS Dataset: To address the lack of suitable benchmarks, the authors introduce ToolLinkOS, comprising 573 fictional tools across 15 industries, with each tool having an average of 6.3 dependencies. This dataset is designed to evaluate tool selection involving interdependent tools.
  • Experimental Results: Graph RAG-Tool Fusion demonstrates significant improvements over naive RAG, achieving absolute improvements of 71.7% on ToolLinkOS and 21.1% on the ToolSandbox benchmarks (mAP@10).

Advanced Retrieval-Augmented Generation

The paper discusses advanced RAG methods, including pre-processing techniques like sliding window chunking and context-enriched chunking, intra-retrieval strategies such as query rewriting and decomposition, and post-retrieval approaches like reranking. Graph RAG extends traditional RAG by using knowledge graphs and vector retrieval, where retrieved documents can be concatenated with vector search results.

Knowledge Graphs and LLMs

The integration of Knowledge Graphs (KGs) and LLMs enhances RAG by leveraging structured relationships between document chunks and entities to improve reasoning, retrieval, Question Answering (QA), and summarization tasks. Common approaches use an LLM or agent to decompose queries, explore a knowledge graph, extract relevant subgraphs, and solve multi-hop queries. Graph RAG-Tool Fusion uses a knowledge graph for tool selection rather than document Question Answering (QA) or schema creation, avoiding reliance on graph learning or text-to-Cypher prompting.

Tool Selection or Retrieval

The paper contrasts Graph RAG-Tool Fusion with existing methods for tool selection, including lexical term matching, retriever-based methods using neural networks, and LLM-based methods that use an LLM or agent for planning and retrieval. Unlike these, Graph RAG-Tool Fusion uniquely combines knowledge graphs with vector retrieval to improve tool selection by efficiently indexing and retrieving structured dependencies between tools.

Graph Indexing of Tools

Each tool is transformed into a schema model, characterized as either a regular tool or a core tool (node), with an optional list of direct or indirect dependencies (edge) to other tools.

  • Core Tool (Node Type): Represents a reusable function that is a typical dependency of other functions. Examples include functions that return the current date ("get_current_date") or Operating System (OS)-related tools like "set_wifi_status."
  • Regular Tool (Node Type): Represents a tool, Application Programming Interface (API), or agent-as-tool that acts as a non-utility tool an agent can use. An example is a "get stock price" tool that relies on knowing the stock ticker.
  • Tool Relationships (Edges): Relationships between tool nodes represent four primary dependency types:
    • Tool directly depends on.
    • Parameter directly depends on.
    • Tool indirectly depends on.
    • Parameter indirectly depends on.

Tool Retrieval Algorithm

The Graph RAG-Tool Fusion algorithm involves transforming the input query, retrieving an initial top-k list of tools via vector search, optionally reranking the results, and performing a graph traversal using depth-first search to identify all dependencies. The final list of tools is ordered based on relevance and dependency relationships.

Graph RAG-Tool Fusion Retrieval Accuracy Equation

The expected retrieval accuracy is modeled as the sum of baseline vector-retrieval accuracy and the additional accuracy contributed by graph traversal of tool dependencies, scaled by the fraction of retrieved tools fitting into the final top-KK limit:

$\mathbb{E}[\mathit{GRTF\ Retrieval}(k,d,K)] = \mathbb{E}\bigl[\mathrm{Retrieval}_{\mathrm{vector}(k)\bigr] + \mathbb{E}\bigl[\mathrm{KG}_{\mathrm{dep}(k,d)\bigr] \;\times\; \min\!\Bigl(1, \tfrac{K}{N}\Bigr)}$

Where:

  • kk is the number of tools initially retrieved through vector search.
  • dd is the cut-off for each tool's total dependencies.
  • KK is the final top-KK cut-off.
  • NN is the total number of tools discovered, including all dependencies.

Dataset Construction

The ToolLinkOS dataset includes 523 regular tools and 50 core tools spanning over 15 industries. The ToolSandbox dataset is also converted into the Graph RAG-Tool Fusion schema for evaluation. The paper compares ToolLinkOS with other benchmarks like ToolBench, ToolE, Seal-Tools, and ComplexFuncBench, noting that ToolLinkOS uniquely offers a predefined knowledge graph schema with a large number of tools and dependencies.

Experimental Settings

  • Models: The experiments use Azure OpenAI's {\tt text-embedding-ada-002} for embeddings and Azure OpenAI's {\tt gpt-4o-2024-08-06} LLM.
  • Metrics: Retrieval performance is evaluated using mean absolute precision (mAP) at 10, 20, and 30, as well as normalized discounted cumulative gain (nDCG) and recall.
  • Baselines: Baselines include lexical search (BM25), naive RAG, and hybrid RAG.

Experimental Results and Discussion

The results indicate that Graph RAG-Tool Fusion significantly outperforms baseline retrievers. Reranking the initial k tools retrieved by vector search further increases retrieval accuracy. The authors attribute the superior performance of Graph RAG-Tool Fusion to its ability to leverage both semantic relevance and structured dependencies, which naive RAG struggles to capture. Error analysis reveals that the primary areas for improvement involve reducing errors within vector search and optimizing the truncation of the final top-K tools.

Limitations

The paper acknowledges that the performance of Graph RAG-Tool Fusion depends on the vector search retriever's accuracy. The manual creation of the tool knowledge graph is another limitation, suggesting future work on automated knowledge graph creation using LLMs. The retrieval system does not prioritize certain relationships, which could be a focus for future improvements.