Papers
Topics
Authors
Recent
Search
2000 character limit reached

Tool Graph Retriever

Updated 3 February 2026
  • Tool Graph Retriever is an advanced retrieval system that uses knowledge graphs to capture functional dependencies among APIs and services.
  • The retrieval algorithm integrates semantic, lexical, and graph-based scoring to enhance multi-step and multi-intent task planning.
  • Evaluation shows significant improvements over baseline methods, achieving up to 91.85% CompleteRecall in complex workflow scenarios.

A Tool Graph Retriever is an advanced retrieval system that exploits the structural dependencies among tools—such as APIs, functions, or services—for context-aware selection of tools in multi-step task planning scenarios. Instead of relying solely on semantic similarity between queries and tool descriptions, tool graph retrieval incorporates knowledge graph (KG) structure to model functional relationships, parameter hand-offs, and contextual dependencies. This enables AI agents and LLM-based planners to identify both direct and indirect tool requirements, thereby enhancing performance, coverage, and reliability in complex, multi-intent workflows (Bansal et al., 7 Aug 2025).

1. Formal Knowledge Graph Construction

Tool graph retrieval is grounded in constructing a knowledge graph (KG) where:

  • Node Types
    • Tool nodes (t∈Vt \in V): Each representing one API/tool.
    • Parameter nodes (p∈Vp \in V): Inputs/outputs required or produced by tools.
    • Business entities: e.g., line of business, department, for context enrichment.
  • Edge Types
    • Relations r∈Rr \in R: includes has_parameter, produces_parameter, depends_on, related_to, etc.
    • Edges E⊆V×R×VE \subseteq V \times R \times V encode both functional dependencies and semantic relationships, e.g., if t1t_1 produces a parameter pp and t2t_2 consumes pp, add (t1,produces_parameter,p)(t_1, produces\_parameter, p) and (p,has_parameter,t2)(p, has\_parameter, t_2). For direct tool-to-tool functional links, edges (t1,depends_on,t2)(t_1, depends\_on, t_2) may be collapsed (Bansal et al., 7 Aug 2025).
  • Ego-Graph Extraction
    • For each seed node uu, the 1-hop ego graph G1(u)G_1(u) consists of uu, its neighbors under all relations, and all edges among them. This models local context and direct/indirect dependencies without assigning real-valued weights.

2. Retrieval Algorithm Design

The retrieval pipeline operates as a hybrid ensemble over semantics, lexicon, and graph structure:

  • Entry-Point Identification
    • Identify candidate entry nodes by top-kk semantic similarity (cosine of embeddings) and lexical (BM25) matches to the query.
    • Union of semantic and textual entry points E0E_0.
  • Ego-Graph Ensemble
    • For each entry u∈E0u \in E_0, extract its 1-hop ego graph G1(u)G_1(u) and form a candidate set of tool nodes TuT_u.
    • The global candidate pool CC is the union of all TuT_u.
  • Hybrid Scoring and Reranking
    • For every candidate tool t∈Ct \in C:
    • ssems_{sem}: Cosine similarity between query and tool embedding.
    • slexs_{lex}: BM25 score between query and tool description.
    • sgraphs_{graph}: Number of ego-graphs in which tt appears.
    • Final score: Sfinal(t∣Q)=λgraphsgraph(t;Q)+λsemssem(Q,t)+λlexslex(Q,t)S_{final}(t|Q) = \lambda_{graph} s_{graph}(t;Q) + \lambda_{sem} s_{sem}(Q,t) + \lambda_{lex} s_{lex}(Q,t), where λ\lambda's are tuned hyperparameters (Bansal et al., 7 Aug 2025).

3. Evaluation Methodology and Performance

Retrieval performance is measured with the micro-average CompleteRecall metric, adapted from table retrieval:

CompleteRecall@k=1∣Q∣∑q∈Q1[Recall@k(q)=1],CompleteRecall@k = \frac{1}{|Q|} \sum_{q \in Q} \mathbb{1}[\text{Recall}@k(q)=1],

where Recall@k(q)=1 iff all ground-truth tools for qq appear in top-kk results.

  • Experimental Configuration
    • Toolset: 177 APIs with metadata-parameter graphs.
    • Queries: 503 synthetic enterprise queries, spanning multiple user classes (single/multi-intent, explicit/implicit/conditional multi-step).
  • Comparative Results (@k=10, CompleteRecall)
    • Lexical baseline: 76.54%
    • Semantic baseline: 85.69%
    • Hybrid: 89.26%
    • Tool Graph Retriever (EEG): 91.85%

Largest gains are observed on queries requiring sequential and conditional multi-step compositions (+14 percentage points on some categories over lexical baseline) (Bansal et al., 7 Aug 2025).

4. Structural Signals and Functional Dependency

Structural signals encode the implicit workflow—parameter hand-offs and execution order—between tools:

  • Addressing Limitations of Similarity-Based Retrieval
    • Pure similarity approaches fail to capture tools whose descriptions do not overlap with the query yet are prerequisites or successors due to hidden dependencies.
  • Ego-Graph Expansion
    • By aggregating across ego-graphs, the retriever uncovers hidden chains ("tool chains") necessary for complete execution plans, improving coverage and success rate in multi-step tasks.

A plausible implication is that as tool sets become larger and more interdependent, modeling functional structure via a KG becomes increasingly critical for agent effectiveness.

5. Limitations, Scalability, and Future Directions

Tool graph retrieval depends heavily on KG coverage and triple quality:

  • Challenges
    • Noisy or missing graph triples degrade retrieval accuracy.
    • For simple multi-intent queries, hybrid methods may outperform graph-based retrievers, suggesting unnecessary structural overhead.
    • Scalability requires further investigation as toolset size and update frequency increase.
  • Prospective Extensions
    • Integration of learned (GNN-based or LLM-based) graph embeddings.
    • Automated validation and completion of triples.
    • Expansion from one-hop ego-graph ensembles to multi-hop planning for longer reasoning chains (Bansal et al., 7 Aug 2025).

6. Practical Implementation and Research Impact

Tool graph retrievers constitute a foundation for next-generation LLM agents in enterprise automation, service orchestration, and complex dialog planning. Key features:

  • Plug-and-play Framework
    • Modular design enables integration with various embedding models and indexing backends.
    • Requires only KG construction and routine hyperparameter tuning for λ\lambda weights.
  • Empirical Impact
    • Consistent and robust improvement over strong semantic and hybrid baselines in high-coverage, multi-step scenarios.
    • Applicability extends to biomedical knowledge retrieval and task planning in other domains, aligning with contemporary research in KG-based RAG, agent orchestration, and subgraph-based generation.

Tool graph retrieval advances the field by synthesizing semantic, lexical, and structural signals for optimal tool selection and workflow generation in data-rich, functionally interdependent environments (Bansal et al., 7 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Tool Graph Retriever.