Tool Dependency Graph for Multi-Tool Orchestration

Updated 28 January 2026

Tool Dependency Graph (TDG) is a directed graph framework that models tools, APIs, and their prerequisite relations for streamlined orchestration.
TDG construction combines LLM-based schema matching, supervised classification, and heuristic methods to accurately extract and verify inter-tool dependencies.
TDGs integrate with planning and retrieval workflows, improving toolchain execution accuracy and security by enforcing controlled, dependency-aware operations.

A Tool Dependency Graph (TDG) is a directed graph structure designed to formally encode dependencies among a set of tools, APIs, or function components, supporting the orchestration, selection, planning, and security of multi-tool agentic systems. Each node in a TDG typically models an individual tool or related schema element, while directed edges encode prerequisite relationships where the output of one tool can be valid input to another. The TDG abstraction underpins recent advances in LLM-augmented tool planning, retrieval, program generation, prompt-hardened execution, and parallelism management across diverse domains including language agent pipelines, security, and HPC runtimes.

1. Formal Definition and Graph Structure

The TDG is universally modeled as a directed graph $G = (V, E)$ over a set of tools or tool-related components:

Nodes ( $V$ ): Each node represents either a tool's schema (encompassing the tool's name, description, argument signature, and output payload), an individual tool instance, or finer entities such as API parameters and procedural steps depending on context. For example, in (Liu et al., 28 Oct 2025), nodes are tool schemas augmented for fusion with document nodes.
Edges ( $E$ ): Each directed edge $(u \rightarrow v)$ asserts a dependency, typically that the output of $u$ can serve as a required input for $v$ ( $u$ “can_use_this_tool_output” to $v$ ) (Liu et al., 28 Oct 2025). In more specialized forms, such as in Taskgraph for OpenMP, edges are labeled by the precise data-flow dependency kind (RAW/WAR/WAW) (Yu et al., 2022), or bear statistical weights from invocation logs (Jiang et al., 24 Jun 2025).

Many systems also assign relation labels or typed edge classes (e.g., “depends_on,” input-output, procedural reference), and some treat the graph as heterogeneous, with multi-typed nodes and edges (API endpoints and parameters (Jiang et al., 24 Jun 2025)).

2. Construction Algorithms and Dependency Extraction

TDG construction pipelines are typically hybrid, combining static schema analysis, LLM-powered extraction, and data-driven or manual curation:

LLM-based Schema Matching: For each tool, tuples $(\mathrm{name}_t, \mathrm{desc}_t, \mathrm{args}_t, \mathrm{output}_t)$ are extracted. For each $(t_i, t_j)$ , an LLM judges if $V$ 0's output can be input to $V$ 1 (Liu et al., 28 Oct 2025). This approach is formalized in several frameworks with high-precision/recall rates (e.g., GPT-4o: 90.7% precision, 80.5% recall in dependency prediction) (Liu et al., 28 Oct 2025).
Supervised Classification: Construction is also realized by training discriminators over labeled datasets (e.g., TDI300K (Gao et al., 7 Aug 2025)), where model architectures (e.g., BERT) are given tool documentation pairs to determine directed dependencies.
Empirical and Heuristic Methods: Historical invocation trajectories are mined for edge inference (e.g., adding edges for consecutive observed tool calls (Chen et al., 18 Aug 2025)), and missing dependencies are predicted using graph neural networks regularized by link-prediction losses (Chen et al., 18 Aug 2025).

The result is often a sparse, directed graph that may be incomplete, particularly in settings with limited trajectory coverage or evolving toolsets.

3. Integration with Planning, Retrieval, and Reasoning Workflows

TDGs serve as central data structures for advanced planning and retrieval architectures:

Planning: TDGs are traversed (often in topological order for acyclic graphs) to generate valid tool call sequences for complex user objectives, ensuring all prerequisites are satisfied (An et al., 21 Aug 2025). In systems such as IPIGuard, the explicit pre-planned TDG enforces execution-time security constraints.
Retrieval: Graph convolutional methods propagate information through the TDG, producing dependency-aware tool embeddings for improved semantic retrieval. For instance, in Tool Graph Retriever, a lightweight GCN applied over the TDG yields substantial improvements in Recall, NDCG, and PassRate metrics (Gao et al., 7 Aug 2025).
Dense-Sparse Integration: Modern frameworks combine dense semantic retrieval over tool/document embeddings with sparse graph expansion (e.g., via Personalized PageRank), enabling the selection not only of semantically relevant tools but also their connected dependencies (Liu et al., 28 Oct 2025). This hybrid strategy yields gains in in-context plan generation efficacy.
Graph RAG Fusion: Plug-and-play architectures recursively traverse the TDG (e.g., DFS/BFS up to depth $V$ 2) post vector search, ensuring all k-hop dependencies are included for toolchain execution (Lumer et al., 11 Feb 2025), systematically mitigating the missing-dependency risk of naïve RAG.

4. Security, Robustness, and Execution Control

TDGs form the foundational element in agent security hardening, particularly against indirect prompt injection (IPI):

Execution-Oriented Security: By decoupling planning (graph construction) from execution (topologically ordered traversal), only pre-authorized tool invocations are possible (An et al., 21 Aug 2025). This structural constraint can reduce attack success rates to below 1% while preserving close-to-maximal utility under attack, as demonstrated by the IPIGuard defense.
Argument Estimation and Expansion: During execution, the framework restricts the set of executable actions to nodes present in the plan TDG and controls the arguments to those authorized at plan time, thwarting injected tool calls unless they are part of a benign expansion mechanism (An et al., 21 Aug 2025).

This suggests that formal TDGs provide a robust, model-agnostic framework for controlling agentic behaviors in open environments where data and tool responses may be adversarial.

5. Advanced Graph Embeddings and Adaptive Toolchain Orchestration

TDGs increasingly leverage advanced embedding and navigation schemes:

Graph Neural Network Embeddings: Systems such as GTool use GNNs to encode structural and attribute information across the incomplete TDG, condensing this into a “<graph token>” supplied to an LLM for plan generation (Chen et al., 18 Aug 2025). Link prediction regularization enables the recovery of unobserved or missing dependencies, maintaining strong planning under extreme edge sparsity.
Heterogeneous Typing and Weighting: NaviAgent additionally includes parameter nodes, type-specific linear encoding, and invocation-frequency weighted edges: embedding functions (heterogeneous graph transformers) merge schema and behavioral histories (Jiang et al., 24 Jun 2025). The graph navigator performs backward search and hybrid heuristic optimization over the TDHG, outputting high-confidence toolchains for flexible, multi-path decision procedures.
Efficiency and Scalability: Embedding-based representations (single compact “graph token”) drastically reduce context-window requirements and inference times relative to prompt-heavy benchmarks, while maintaining or exceeding plan accuracy (Chen et al., 18 Aug 2025).

6. Quantitative Impact and Empirical Validations

Empirical results across diverse tasks confirm the value of TDGs:

Retrieval Improvements: Tool Graph Retriever raises Recall@5 on API-Bank from 0.659 to 0.736 (+11.6 points) and ToolBench-IR from 0.714 to 0.761 (Gao et al., 7 Aug 2025). Graph RAG-Tool Fusion achieves up to 71.7 percentage points mAP@10 gain over naïve RAG on the ToolLinkOS benchmark (Lumer et al., 11 Feb 2025).
Plan Generation: Incorporating TDG traversal and edge-aware retrieval produces higher coverage and alignment in plan generation tasks, with top LLMs (e.g., GPT-4o) achieving 77% binary match and 1.62 judge score (max 2) (Liu et al., 28 Oct 2025).
Security Efficacy: IPIGuard on AgentDojo achieves attack success rates $V$ 31% with minimal utility loss (UA=58.77% vs. upper bound 68%) (An et al., 21 Aug 2025).
Parallelism Management: In HPC, precomputed TDGs for OpenMP tasks allow elimination of lock contentions and atomic dependency checks, enabling speedups up to $V$ 4 in fine-grained, high-core-count environments (Yu et al., 2022).

7. Limitations and Prospective Directions

Edge Incompleteness: Many practical deployments rely on sparse, incomplete, or inferred edges. Joint training with missing-edge prediction or continual graph update is essential for robustness (Chen et al., 18 Aug 2025).
Graph Evolution and Scaling: Memory and parallelism constraints arise with extremely large TDGs (Yu et al., 2022). Online evolution, graph compression, and distributed computation are active areas of research.
Integration with Domain Knowledge: Fusing procedural text graphs with structural TDGs enables richer planning but remains an ad hoc process, lacking rigorous formalization for graph alignment and fusion (Liu et al., 28 Oct 2025).

These dimensions collectively frame the TDG as a unifying abstraction underpinning next-generation tool-augmented agents, blending static structure, learned behavior, semantic representation, security, and computational efficiency.