Tool Graph Retriever

Updated 3 February 2026

Tool Graph Retriever is an advanced retrieval system that uses knowledge graphs to capture functional dependencies among APIs and services.
The retrieval algorithm integrates semantic, lexical, and graph-based scoring to enhance multi-step and multi-intent task planning.
Evaluation shows significant improvements over baseline methods, achieving up to 91.85% CompleteRecall in complex workflow scenarios.

A Tool Graph Retriever is an advanced retrieval system that exploits the structural dependencies among tools—such as APIs, functions, or services—for context-aware selection of tools in multi-step task planning scenarios. Instead of relying solely on semantic similarity between queries and tool descriptions, tool graph retrieval incorporates knowledge graph (KG) structure to model functional relationships, parameter hand-offs, and contextual dependencies. This enables AI agents and LLM-based planners to identify both direct and indirect tool requirements, thereby enhancing performance, coverage, and reliability in complex, multi-intent workflows (Bansal et al., 7 Aug 2025).

1. Formal Knowledge Graph Construction

Tool graph retrieval is grounded in constructing a knowledge graph (KG) where:

Node Types
- Tool nodes ( $t \in V$ ): Each representing one API/tool.
- Parameter nodes ( $p \in V$ ): Inputs/outputs required or produced by tools.
- Business entities: e.g., line of business, department, for context enrichment.
Edge Types
- Relations $r \in R$ : includes has_parameter, produces_parameter, depends_on, related_to, etc.
- Edges $E \subseteq V \times R \times V$ encode both functional dependencies and semantic relationships, e.g., if $t_1$ produces a parameter $p$ and $t_2$ consumes $p$ , add $(t_1, produces\_parameter, p)$ and $(p, has\_parameter, t_2)$ . For direct tool-to-tool functional links, edges $(t_1, depends\_on, t_2)$ may be collapsed (Bansal et al., 7 Aug 2025).
Ego-Graph Extraction
- For each seed node $u$ , the 1-hop ego graph $G_1(u)$ consists of $u$ , its neighbors under all relations, and all edges among them. This models local context and direct/indirect dependencies without assigning real-valued weights.

2. Retrieval Algorithm Design

The retrieval pipeline operates as a hybrid ensemble over semantics, lexicon, and graph structure:

Entry-Point Identification
- Identify candidate entry nodes by top- $k$ semantic similarity (cosine of embeddings) and lexical (BM25) matches to the query.
- Union of semantic and textual entry points $E_0$ .
Ego-Graph Ensemble
- For each entry $u \in E_0$ , extract its 1-hop ego graph $G_1(u)$ and form a candidate set of tool nodes $T_u$ .
- The global candidate pool $C$ is the union of all $T_u$ .
Hybrid Scoring and Reranking
- For every candidate tool $t \in C$ :
- $s_{sem}$ : Cosine similarity between query and tool embedding.
- $s_{lex}$ : BM25 score between query and tool description.
- $s_{graph}$ : Number of ego-graphs in which $t$ appears.
- Final score: $S_{final}(t|Q) = \lambda_{graph} s_{graph}(t;Q) + \lambda_{sem} s_{sem}(Q,t) + \lambda_{lex} s_{lex}(Q,t)$ , where $\lambda$ 's are tuned hyperparameters (Bansal et al., 7 Aug 2025).

3. Evaluation Methodology and Performance

Retrieval performance is measured with the micro-average CompleteRecall metric, adapted from table retrieval:

$CompleteRecall@k = \frac{1}{|Q|} \sum_{q \in Q} \mathbb{1}[\text{Recall}@k(q)=1],$

where Recall@k(q)=1 iff all ground-truth tools for $q$ appear in top- $k$ results.

Experimental Configuration
- Toolset: 177 APIs with metadata-parameter graphs.
- Queries: 503 synthetic enterprise queries, spanning multiple user classes (single/multi-intent, explicit/implicit/conditional multi-step).
Comparative Results (@k=10, CompleteRecall)
- Lexical baseline: 76.54%
- Semantic baseline: 85.69%
- Hybrid: 89.26%
- Tool Graph Retriever (EEG): 91.85%

Largest gains are observed on queries requiring sequential and conditional multi-step compositions (+14 percentage points on some categories over lexical baseline) (Bansal et al., 7 Aug 2025).

4. Structural Signals and Functional Dependency

Structural signals encode the implicit workflow—parameter hand-offs and execution order—between tools:

Addressing Limitations of Similarity-Based Retrieval
- Pure similarity approaches fail to capture tools whose descriptions do not overlap with the query yet are prerequisites or successors due to hidden dependencies.
Ego-Graph Expansion
- By aggregating across ego-graphs, the retriever uncovers hidden chains ("tool chains") necessary for complete execution plans, improving coverage and success rate in multi-step tasks.

A plausible implication is that as tool sets become larger and more interdependent, modeling functional structure via a KG becomes increasingly critical for agent effectiveness.

5. Limitations, Scalability, and Future Directions

Tool graph retrieval depends heavily on KG coverage and triple quality:

Challenges
- Noisy or missing graph triples degrade retrieval accuracy.
- For simple multi-intent queries, hybrid methods may outperform graph-based retrievers, suggesting unnecessary structural overhead.
- Scalability requires further investigation as toolset size and update frequency increase.
Prospective Extensions
- Integration of learned (GNN-based or LLM-based) graph embeddings.
- Automated validation and completion of triples.
- Expansion from one-hop ego-graph ensembles to multi-hop planning for longer reasoning chains (Bansal et al., 7 Aug 2025).

6. Practical Implementation and Research Impact

Tool graph retrievers constitute a foundation for next-generation LLM agents in enterprise automation, service orchestration, and complex dialog planning. Key features:

Plug-and-play Framework
- Modular design enables integration with various embedding models and indexing backends.
- Requires only KG construction and routine hyperparameter tuning for $\lambda$ weights.
Empirical Impact
- Consistent and robust improvement over strong semantic and hybrid baselines in high-coverage, multi-step scenarios.
- Applicability extends to biomedical knowledge retrieval and task planning in other domains, aligning with contemporary research in KG-based RAG, agent orchestration, and subgraph-based generation.

Tool graph retrieval advances the field by synthesizing semantic, lexical, and structural signals for optimal tool selection and workflow generation in data-rich, functionally interdependent environments (Bansal et al., 7 Aug 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Planning Agents on an Ego-Trip: Leveraging Hybrid Ego-Graph Ensembles for Improved Tool Retrieval in Enterprise Task Planning (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Tool Graph Retriever.