Tool Graph Retriever
- Tool Graph Retriever is an advanced retrieval system that uses knowledge graphs to capture functional dependencies among APIs and services.
- The retrieval algorithm integrates semantic, lexical, and graph-based scoring to enhance multi-step and multi-intent task planning.
- Evaluation shows significant improvements over baseline methods, achieving up to 91.85% CompleteRecall in complex workflow scenarios.
A Tool Graph Retriever is an advanced retrieval system that exploits the structural dependencies among tools—such as APIs, functions, or services—for context-aware selection of tools in multi-step task planning scenarios. Instead of relying solely on semantic similarity between queries and tool descriptions, tool graph retrieval incorporates knowledge graph (KG) structure to model functional relationships, parameter hand-offs, and contextual dependencies. This enables AI agents and LLM-based planners to identify both direct and indirect tool requirements, thereby enhancing performance, coverage, and reliability in complex, multi-intent workflows (Bansal et al., 7 Aug 2025).
1. Formal Knowledge Graph Construction
Tool graph retrieval is grounded in constructing a knowledge graph (KG) where:
- Node Types
- Tool nodes (): Each representing one API/tool.
- Parameter nodes (): Inputs/outputs required or produced by tools.
- Business entities: e.g., line of business, department, for context enrichment.
- Edge Types
- Relations : includes
has_parameter,produces_parameter,depends_on,related_to, etc. - Edges encode both functional dependencies and semantic relationships, e.g., if produces a parameter and consumes , add and . For direct tool-to-tool functional links, edges may be collapsed (Bansal et al., 7 Aug 2025).
- Relations : includes
- Ego-Graph Extraction
- For each seed node , the 1-hop ego graph consists of , its neighbors under all relations, and all edges among them. This models local context and direct/indirect dependencies without assigning real-valued weights.
2. Retrieval Algorithm Design
The retrieval pipeline operates as a hybrid ensemble over semantics, lexicon, and graph structure:
- Entry-Point Identification
- Identify candidate entry nodes by top- semantic similarity (cosine of embeddings) and lexical (BM25) matches to the query.
- Union of semantic and textual entry points .
- Ego-Graph Ensemble
- For each entry , extract its 1-hop ego graph and form a candidate set of tool nodes .
- The global candidate pool is the union of all .
- Hybrid Scoring and Reranking
- For every candidate tool :
- : Cosine similarity between query and tool embedding.
- : BM25 score between query and tool description.
- : Number of ego-graphs in which appears.
- Final score: , where 's are tuned hyperparameters (Bansal et al., 7 Aug 2025).
3. Evaluation Methodology and Performance
Retrieval performance is measured with the micro-average CompleteRecall metric, adapted from table retrieval:
where Recall@k(q)=1 iff all ground-truth tools for appear in top- results.
- Experimental Configuration
- Toolset: 177 APIs with metadata-parameter graphs.
- Queries: 503 synthetic enterprise queries, spanning multiple user classes (single/multi-intent, explicit/implicit/conditional multi-step).
- Comparative Results (@k=10, CompleteRecall)
- Lexical baseline: 76.54%
- Semantic baseline: 85.69%
- Hybrid: 89.26%
- Tool Graph Retriever (EEG): 91.85%
Largest gains are observed on queries requiring sequential and conditional multi-step compositions (+14 percentage points on some categories over lexical baseline) (Bansal et al., 7 Aug 2025).
4. Structural Signals and Functional Dependency
Structural signals encode the implicit workflow—parameter hand-offs and execution order—between tools:
- Addressing Limitations of Similarity-Based Retrieval
- Pure similarity approaches fail to capture tools whose descriptions do not overlap with the query yet are prerequisites or successors due to hidden dependencies.
- Ego-Graph Expansion
- By aggregating across ego-graphs, the retriever uncovers hidden chains ("tool chains") necessary for complete execution plans, improving coverage and success rate in multi-step tasks.
A plausible implication is that as tool sets become larger and more interdependent, modeling functional structure via a KG becomes increasingly critical for agent effectiveness.
5. Limitations, Scalability, and Future Directions
Tool graph retrieval depends heavily on KG coverage and triple quality:
- Challenges
- Noisy or missing graph triples degrade retrieval accuracy.
- For simple multi-intent queries, hybrid methods may outperform graph-based retrievers, suggesting unnecessary structural overhead.
- Scalability requires further investigation as toolset size and update frequency increase.
- Prospective Extensions
- Integration of learned (GNN-based or LLM-based) graph embeddings.
- Automated validation and completion of triples.
- Expansion from one-hop ego-graph ensembles to multi-hop planning for longer reasoning chains (Bansal et al., 7 Aug 2025).
6. Practical Implementation and Research Impact
Tool graph retrievers constitute a foundation for next-generation LLM agents in enterprise automation, service orchestration, and complex dialog planning. Key features:
- Plug-and-play Framework
- Modular design enables integration with various embedding models and indexing backends.
- Requires only KG construction and routine hyperparameter tuning for weights.
- Empirical Impact
- Consistent and robust improvement over strong semantic and hybrid baselines in high-coverage, multi-step scenarios.
- Applicability extends to biomedical knowledge retrieval and task planning in other domains, aligning with contemporary research in KG-based RAG, agent orchestration, and subgraph-based generation.
Tool graph retrieval advances the field by synthesizing semantic, lexical, and structural signals for optimal tool selection and workflow generation in data-rich, functionally interdependent environments (Bansal et al., 7 Aug 2025).