AgentSHAP: LLM Tool Attribution

Updated 18 December 2025

AgentSHAP is a model-agnostic explainability framework that quantifies the individual contributions of external tools used by LLM agents via Shapley values.
It employs a two-phase Monte Carlo estimation method—combining leave-one-out and random subset sampling—to efficiently approximate each tool’s marginal effect.
Empirical evaluations on API-Bank demonstrate stable and faithful attributions, with significant semantic degradation when high-impact tools are removed.

AgentSHAP is a model-agnostic explainability framework for quantifying the importance of external tools used by LLM agents. It attributes the agent’s output to specific tools, providing principled, game-theoretic scores for each tool’s contribution. AgentSHAP extends the Shapley-based interpretability family (TokenSHAP, PixelSHAP) to the tool level, enabling robust debugging, trust calibration, and cost optimization in complex agentic workflows (Horovicz, 14 Dec 2025).

1. Formalization of Tool Shapley Values

AgentSHAP defines tool importance through the Shapley value, a unique, fair attribution mechanism from cooperative game theory. For a given prompt $p$ , agent $\mathcal{A}$ , and tool set $T = \{t_1, \dots, t_n\}$ , the framework computes, for each $t_i$ , a score $\phi_i$ reflecting its marginal contribution to the response.

Let $\mathcal{A}(p,S)$ denote the agent’s output using subset $S \subseteq T$ and $v(S) = \mathrm{sim}(\mathcal{A}(p,S), \mathcal{A}(p,T))$ be the value function, typically the cosine similarity between the restricted-response and the full-toolset response. The Shapley value for $t_i$ is: $\phi_i = \sum_{S \subseteq T \setminus \{t_i\}} \frac{|S|!(n - |S| - 1)!}{n!} \bigl[ v(S \cup \{t_i\}) - v(S) \bigr].$ This aggregates $t_i$ 's marginal effect across all subsets, satisfying the efficiency, symmetry, null-player, and additivity Shapley axioms. AgentSHAP does not assume access to the agent’s parameters or gradients, and all attributions are black-box and model-agnostic.

2. Monte Carlo Estimation Algorithm

Direct computation of Shapley values is intractable for nontrivial $n$ due to $2^n$ subset evaluations. AgentSHAP employs a two-phase Monte Carlo scheme, dramatically reducing sample complexity:

Phase A (Leave-One-Out): For each tool, compute the marginal effect when it is individually omitted. This guarantees all first-order effects are represented explicitly.
Phase B (Random Subsets): For $m = \lfloor \rho(2^n-n-1) \rfloor$ (typically $\rho=0.5$ ) random subsets, sample coalition $S_j \subseteq T$ , evaluate $\mathcal{A}(p, S_j)$ , and update each candidate tool’s contribution according to the ordering sampled.

Formally, the number of model calls for AgentSHAP is $n$ (leave-one-out) plus $O(\rho\,2^n)$ for the Monte Carlo phase. With $n=8$ and $\rho=0.5$ , practical computation can be achieved in $\sim$ 130 calls versus 256 for exact enumeration.

The framework combines estimates from both phases, weighted by the canonical Shapley coefficients, yielding accurate tool attributions under tight compute constraints.

3. Model-Agnostic, Tool-Agnostic Operation

AgentSHAP operates solely with input–output access to the LLM agent, requiring neither gradient information nor model internals. It accommodates proprietary, API-based LLMs (e.g., GPT, Claude) and open-source models equally. Arbitrary external tool types (APIs, calculators, retrieval, code execution) can be incorporated, and the approach remains agnostic to the order or nature of tool invocation.

Value function computations are typically based on the cosine similarity of embedding representations (e.g., “text-embedding-3-large”), though alternative metrics (BLEU, ROUGE) are possible, albeit with reduced semantic sensitivity.

4. Experimental Evaluation on API-Bank

Empirical validation was conducted on the API-Bank benchmark, with $n=8$ APIs spanning Calculator, QueryStock, Wiki, AddAlarm, AddReminder, PlayMusic, BookHotel, and Translate, and prompts drawn from mathematical, financial, and general-knowledge domains.

LLM Agent: GPT-4o-mini.
Similarity Metric: text-embedding-3-large cosine similarity.
Sampling configuration: $\rho = 0.5$ ; three Monte Carlo trials for reproducibility.
Metrics:
- Top-1 accuracy—alignment of highest $\phi_i$ with ground truth.
- Cosine similarity among $\phi_i$ vectors across runs (stability).
- Quality drop—semantic loss when removing the highest-/ $\phi$ tool (faithfulness).
- SHAP gap—difference in $\phi$ means between relevant and irrelevant tools.

Summary of results includes:

Metric	Outcome
Consistency	Mean cosine similarity 0.945; Top-1 = 100% (9/9)
Faithfulness	Quality drop 0.67 (high- $\phi$ ) vs 0.05 (low- $\phi$ )
Irrelevant Injection	Mean $\phi=0.53$ (relevant), $0.07$ (irrelevant); Top-1 = 86%
Cross-Domain Accuracy	Correct highest tool selected 86% of the time

The findings indicate that AgentSHAP yields stable and faithful tool importance attributions: removal of tools with high Shapley scores results in significant semantic degradation, while irrelevant/distractor tools receive low $\phi$ , even amid injected noise tools.

5. Hyperparameters and Architectural Choices

Key hyperparameters in AgentSHAP include:

Sampling ratio $\rho$ : Controls the Monte Carlo budget. Lower $\rho$ reduces evaluation cost at the expense of increased variance.
Value function $\mathrm{sim}(\cdot,\cdot)$ : Semantic similarity metric, with text-embedding-3-large as the recommended baseline. Lexical metrics are permissible but less robust.
Monte Carlo repetitions: Multiple random seeds improve stability assessments.
Leave-one-out inclusion: Always executed, guaranteeing direct marginal assessments for every tool.

The framework's flexibility allows adaptation to different agent architectures, tool inventories, and prompt types.

6. Scope, Limitations, and Relationship to Other Frameworks

Limitations

AgentSHAP attributes only first-order (individual) tool importance and does not dissect higher-order (synergistic) tool interactions.
The analysis is restricted to single-prompt, single-turn settings; dynamic importance across turns in multi-step dialogues is not addressed.
Attribution is with respect to tool set inclusion, not the temporal sequence or call graph structure.
Complexity remains exponential in the worst case; real-time deployment at high $n$ or low hardware budgets is challenging.

Extensions and Future Directions

Potential enhancements include Shapley interaction indices for higher-order effects, streaming/online Shapley for conversational agents, and causal-chain tracing for sequential or graph-based tool usage.
AgentSHAP scores can drive automated pruning or toolset optimization pipelines in production contexts.

Relationship to TokenSHAP and PixelSHAP

AgentSHAP is built on the same Monte Carlo Shapley formalism as TokenSHAP (token-level attribution) and PixelSHAP (vision-language region attribution).
All three frameworks are distinguished by black-box, model-agnostic operation and adherence to Shapley game-theoretic principles, with polynomial sampling tradeoffs supplanting exponential enumeration.

7. Comparative Summary

AgentSHAP represents the first dedicated approach to explaining LLM agent tool selection using formal, fair, and faithful attribution. Its principled reliance on Shapley values, efficient sampling, and empirical validation distinguish it within the explainable AI (XAI) landscape. The framework’s consistency across domains and robustness against irrelevant tool injection underscore its practical utility for transparent, agentic AI systems (Horovicz, 14 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

AgentSHAP: Interpreting LLM Agent Tool Importance with Monte Carlo Shapley Value Estimation (2025)

AgentSHAP: LLM Tool Attribution

1. Formalization of Tool Shapley Values

2. Monte Carlo Estimation Algorithm

3. Model-Agnostic, Tool-Agnostic Operation

4. Experimental Evaluation on API-Bank

5. Hyperparameters and Architectural Choices

6. Scope, Limitations, and Relationship to Other Frameworks

Limitations

Extensions and Future Directions

Relationship to TokenSHAP and PixelSHAP

7. Comparative Summary

Whiteboard

Follow Topic

Continue Learning

AgentSHAP: LLM Tool Attribution

1. Formalization of Tool Shapley Values

2. Monte Carlo Estimation Algorithm

3. Model-Agnostic, Tool-Agnostic Operation

4. Experimental Evaluation on API-Bank

5. Hyperparameters and Architectural Choices

6. Scope, Limitations, and Relationship to Other Frameworks

Limitations

Extensions and Future Directions

Relationship to TokenSHAP and PixelSHAP

7. Comparative Summary

Sponsor

Whiteboard

Follow Topic

Continue Learning

Related Topics