AgentSHAP: LLM Tool Attribution
- AgentSHAP is a model-agnostic explainability framework that quantifies the individual contributions of external tools used by LLM agents via Shapley values.
- It employs a two-phase Monte Carlo estimation method—combining leave-one-out and random subset sampling—to efficiently approximate each tool’s marginal effect.
- Empirical evaluations on API-Bank demonstrate stable and faithful attributions, with significant semantic degradation when high-impact tools are removed.
AgentSHAP is a model-agnostic explainability framework for quantifying the importance of external tools used by LLM agents. It attributes the agent’s output to specific tools, providing principled, game-theoretic scores for each tool’s contribution. AgentSHAP extends the Shapley-based interpretability family (TokenSHAP, PixelSHAP) to the tool level, enabling robust debugging, trust calibration, and cost optimization in complex agentic workflows (Horovicz, 14 Dec 2025).
1. Formalization of Tool Shapley Values
AgentSHAP defines tool importance through the Shapley value, a unique, fair attribution mechanism from cooperative game theory. For a given prompt , agent , and tool set , the framework computes, for each , a score reflecting its marginal contribution to the response.
Let denote the agent’s output using subset and be the value function, typically the cosine similarity between the restricted-response and the full-toolset response. The Shapley value for is: This aggregates 's marginal effect across all subsets, satisfying the efficiency, symmetry, null-player, and additivity Shapley axioms. AgentSHAP does not assume access to the agent’s parameters or gradients, and all attributions are black-box and model-agnostic.
2. Monte Carlo Estimation Algorithm
Direct computation of Shapley values is intractable for nontrivial due to subset evaluations. AgentSHAP employs a two-phase Monte Carlo scheme, dramatically reducing sample complexity:
- Phase A (Leave-One-Out): For each tool, compute the marginal effect when it is individually omitted. This guarantees all first-order effects are represented explicitly.
- Phase B (Random Subsets): For (typically ) random subsets, sample coalition , evaluate , and update each candidate tool’s contribution according to the ordering sampled.
Formally, the number of model calls for AgentSHAP is (leave-one-out) plus for the Monte Carlo phase. With and , practical computation can be achieved in 130 calls versus 256 for exact enumeration.
The framework combines estimates from both phases, weighted by the canonical Shapley coefficients, yielding accurate tool attributions under tight compute constraints.
3. Model-Agnostic, Tool-Agnostic Operation
AgentSHAP operates solely with input–output access to the LLM agent, requiring neither gradient information nor model internals. It accommodates proprietary, API-based LLMs (e.g., GPT, Claude) and open-source models equally. Arbitrary external tool types (APIs, calculators, retrieval, code execution) can be incorporated, and the approach remains agnostic to the order or nature of tool invocation.
Value function computations are typically based on the cosine similarity of embedding representations (e.g., “text-embedding-3-large”), though alternative metrics (BLEU, ROUGE) are possible, albeit with reduced semantic sensitivity.
4. Experimental Evaluation on API-Bank
Empirical validation was conducted on the API-Bank benchmark, with APIs spanning Calculator, QueryStock, Wiki, AddAlarm, AddReminder, PlayMusic, BookHotel, and Translate, and prompts drawn from mathematical, financial, and general-knowledge domains.
- LLM Agent: GPT-4o-mini.
- Similarity Metric: text-embedding-3-large cosine similarity.
- Sampling configuration: ; three Monte Carlo trials for reproducibility.
- Metrics:
- Top-1 accuracy—alignment of highest with ground truth.
- Cosine similarity among vectors across runs (stability).
- Quality drop—semantic loss when removing the highest-/ tool (faithfulness).
- SHAP gap—difference in means between relevant and irrelevant tools.
Summary of results includes:
| Metric | Outcome |
|---|---|
| Consistency | Mean cosine similarity 0.945; Top-1 = 100% (9/9) |
| Faithfulness | Quality drop 0.67 (high-) vs 0.05 (low-) |
| Irrelevant Injection | Mean (relevant), $0.07$ (irrelevant); Top-1 = 86% |
| Cross-Domain Accuracy | Correct highest tool selected 86% of the time |
The findings indicate that AgentSHAP yields stable and faithful tool importance attributions: removal of tools with high Shapley scores results in significant semantic degradation, while irrelevant/distractor tools receive low , even amid injected noise tools.
5. Hyperparameters and Architectural Choices
Key hyperparameters in AgentSHAP include:
- Sampling ratio : Controls the Monte Carlo budget. Lower reduces evaluation cost at the expense of increased variance.
- Value function : Semantic similarity metric, with text-embedding-3-large as the recommended baseline. Lexical metrics are permissible but less robust.
- Monte Carlo repetitions: Multiple random seeds improve stability assessments.
- Leave-one-out inclusion: Always executed, guaranteeing direct marginal assessments for every tool.
The framework's flexibility allows adaptation to different agent architectures, tool inventories, and prompt types.
6. Scope, Limitations, and Relationship to Other Frameworks
Limitations
- AgentSHAP attributes only first-order (individual) tool importance and does not dissect higher-order (synergistic) tool interactions.
- The analysis is restricted to single-prompt, single-turn settings; dynamic importance across turns in multi-step dialogues is not addressed.
- Attribution is with respect to tool set inclusion, not the temporal sequence or call graph structure.
- Complexity remains exponential in the worst case; real-time deployment at high or low hardware budgets is challenging.
Extensions and Future Directions
- Potential enhancements include Shapley interaction indices for higher-order effects, streaming/online Shapley for conversational agents, and causal-chain tracing for sequential or graph-based tool usage.
- AgentSHAP scores can drive automated pruning or toolset optimization pipelines in production contexts.
Relationship to TokenSHAP and PixelSHAP
- AgentSHAP is built on the same Monte Carlo Shapley formalism as TokenSHAP (token-level attribution) and PixelSHAP (vision-language region attribution).
- All three frameworks are distinguished by black-box, model-agnostic operation and adherence to Shapley game-theoretic principles, with polynomial sampling tradeoffs supplanting exponential enumeration.
7. Comparative Summary
AgentSHAP represents the first dedicated approach to explaining LLM agent tool selection using formal, fair, and faithful attribution. Its principled reliance on Shapley values, efficient sampling, and empirical validation distinguish it within the explainable AI (XAI) landscape. The framework’s consistency across domains and robustness against irrelevant tool injection underscore its practical utility for transparent, agentic AI systems (Horovicz, 14 Dec 2025).