Papers
Topics
Authors
Recent
2000 character limit reached

AgentSHAP: LLM Tool Attribution

Updated 18 December 2025
  • AgentSHAP is a model-agnostic explainability framework that quantifies the individual contributions of external tools used by LLM agents via Shapley values.
  • It employs a two-phase Monte Carlo estimation method—combining leave-one-out and random subset sampling—to efficiently approximate each tool’s marginal effect.
  • Empirical evaluations on API-Bank demonstrate stable and faithful attributions, with significant semantic degradation when high-impact tools are removed.

AgentSHAP is a model-agnostic explainability framework for quantifying the importance of external tools used by LLM agents. It attributes the agent’s output to specific tools, providing principled, game-theoretic scores for each tool’s contribution. AgentSHAP extends the Shapley-based interpretability family (TokenSHAP, PixelSHAP) to the tool level, enabling robust debugging, trust calibration, and cost optimization in complex agentic workflows (Horovicz, 14 Dec 2025).

1. Formalization of Tool Shapley Values

AgentSHAP defines tool importance through the Shapley value, a unique, fair attribution mechanism from cooperative game theory. For a given prompt pp, agent A\mathcal{A}, and tool set T={t1,,tn}T = \{t_1, \dots, t_n\}, the framework computes, for each tit_i, a score ϕi\phi_i reflecting its marginal contribution to the response.

Let A(p,S)\mathcal{A}(p,S) denote the agent’s output using subset STS \subseteq T and v(S)=sim(A(p,S),A(p,T))v(S) = \mathrm{sim}(\mathcal{A}(p,S), \mathcal{A}(p,T)) be the value function, typically the cosine similarity between the restricted-response and the full-toolset response. The Shapley value for tit_i is: ϕi=ST{ti}S!(nS1)!n![v(S{ti})v(S)].\phi_i = \sum_{S \subseteq T \setminus \{t_i\}} \frac{|S|!(n - |S| - 1)!}{n!} \bigl[ v(S \cup \{t_i\}) - v(S) \bigr]. This aggregates tit_i's marginal effect across all subsets, satisfying the efficiency, symmetry, null-player, and additivity Shapley axioms. AgentSHAP does not assume access to the agent’s parameters or gradients, and all attributions are black-box and model-agnostic.

2. Monte Carlo Estimation Algorithm

Direct computation of Shapley values is intractable for nontrivial nn due to 2n2^n subset evaluations. AgentSHAP employs a two-phase Monte Carlo scheme, dramatically reducing sample complexity:

  • Phase A (Leave-One-Out): For each tool, compute the marginal effect when it is individually omitted. This guarantees all first-order effects are represented explicitly.
  • Phase B (Random Subsets): For m=ρ(2nn1)m = \lfloor \rho(2^n-n-1) \rfloor (typically ρ=0.5\rho=0.5) random subsets, sample coalition SjTS_j \subseteq T, evaluate A(p,Sj)\mathcal{A}(p, S_j), and update each candidate tool’s contribution according to the ordering sampled.

Formally, the number of model calls for AgentSHAP is nn (leave-one-out) plus O(ρ2n)O(\rho\,2^n) for the Monte Carlo phase. With n=8n=8 and ρ=0.5\rho=0.5, practical computation can be achieved in \sim130 calls versus 256 for exact enumeration.

The framework combines estimates from both phases, weighted by the canonical Shapley coefficients, yielding accurate tool attributions under tight compute constraints.

3. Model-Agnostic, Tool-Agnostic Operation

AgentSHAP operates solely with input–output access to the LLM agent, requiring neither gradient information nor model internals. It accommodates proprietary, API-based LLMs (e.g., GPT, Claude) and open-source models equally. Arbitrary external tool types (APIs, calculators, retrieval, code execution) can be incorporated, and the approach remains agnostic to the order or nature of tool invocation.

Value function computations are typically based on the cosine similarity of embedding representations (e.g., “text-embedding-3-large”), though alternative metrics (BLEU, ROUGE) are possible, albeit with reduced semantic sensitivity.

4. Experimental Evaluation on API-Bank

Empirical validation was conducted on the API-Bank benchmark, with n=8n=8 APIs spanning Calculator, QueryStock, Wiki, AddAlarm, AddReminder, PlayMusic, BookHotel, and Translate, and prompts drawn from mathematical, financial, and general-knowledge domains.

  • LLM Agent: GPT-4o-mini.
  • Similarity Metric: text-embedding-3-large cosine similarity.
  • Sampling configuration: ρ=0.5\rho = 0.5; three Monte Carlo trials for reproducibility.
  • Metrics:
    • Top-1 accuracy—alignment of highest ϕi\phi_i with ground truth.
    • Cosine similarity among ϕi\phi_i vectors across runs (stability).
    • Quality drop—semantic loss when removing the highest-/ϕ\phi tool (faithfulness).
    • SHAP gap—difference in ϕ\phi means between relevant and irrelevant tools.

Summary of results includes:

Metric Outcome
Consistency Mean cosine similarity 0.945; Top-1 = 100% (9/9)
Faithfulness Quality drop 0.67 (high-ϕ\phi) vs 0.05 (low-ϕ\phi)
Irrelevant Injection Mean ϕ=0.53\phi=0.53 (relevant), $0.07$ (irrelevant); Top-1 = 86%
Cross-Domain Accuracy Correct highest tool selected 86% of the time

The findings indicate that AgentSHAP yields stable and faithful tool importance attributions: removal of tools with high Shapley scores results in significant semantic degradation, while irrelevant/distractor tools receive low ϕ\phi, even amid injected noise tools.

5. Hyperparameters and Architectural Choices

Key hyperparameters in AgentSHAP include:

  • Sampling ratio ρ\rho: Controls the Monte Carlo budget. Lower ρ\rho reduces evaluation cost at the expense of increased variance.
  • Value function sim(,)\mathrm{sim}(\cdot,\cdot): Semantic similarity metric, with text-embedding-3-large as the recommended baseline. Lexical metrics are permissible but less robust.
  • Monte Carlo repetitions: Multiple random seeds improve stability assessments.
  • Leave-one-out inclusion: Always executed, guaranteeing direct marginal assessments for every tool.

The framework's flexibility allows adaptation to different agent architectures, tool inventories, and prompt types.

6. Scope, Limitations, and Relationship to Other Frameworks

Limitations

  • AgentSHAP attributes only first-order (individual) tool importance and does not dissect higher-order (synergistic) tool interactions.
  • The analysis is restricted to single-prompt, single-turn settings; dynamic importance across turns in multi-step dialogues is not addressed.
  • Attribution is with respect to tool set inclusion, not the temporal sequence or call graph structure.
  • Complexity remains exponential in the worst case; real-time deployment at high nn or low hardware budgets is challenging.

Extensions and Future Directions

  • Potential enhancements include Shapley interaction indices for higher-order effects, streaming/online Shapley for conversational agents, and causal-chain tracing for sequential or graph-based tool usage.
  • AgentSHAP scores can drive automated pruning or toolset optimization pipelines in production contexts.

Relationship to TokenSHAP and PixelSHAP

  • AgentSHAP is built on the same Monte Carlo Shapley formalism as TokenSHAP (token-level attribution) and PixelSHAP (vision-language region attribution).
  • All three frameworks are distinguished by black-box, model-agnostic operation and adherence to Shapley game-theoretic principles, with polynomial sampling tradeoffs supplanting exponential enumeration.

7. Comparative Summary

AgentSHAP represents the first dedicated approach to explaining LLM agent tool selection using formal, fair, and faithful attribution. Its principled reliance on Shapley values, efficient sampling, and empirical validation distinguish it within the explainable AI (XAI) landscape. The framework’s consistency across domains and robustness against irrelevant tool injection underscore its practical utility for transparent, agentic AI systems (Horovicz, 14 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to AgentSHAP.