SHAPLLM: Shapley Methods for LLMs
- SHAPLLM is a framework that integrates Shapley value-based strategies with LLMs to enable non-uniform pruning and generate interpretable predictions.
- It employs techniques like SV-NUP for adaptive layer pruning and hybrid pipelines that combine token-level attributions with LLM-generated rationales.
- Leveraging cooperative game theory and efficient approximation methods, SHAPLLM optimizes model performance while supporting human-in-the-loop decision support.
SHAPLLM denotes a collection of recent methodologies and frameworks at the intersection of Shapley value-based explainability and LLMs. This term encompasses Shapley Value-based Non-Uniform Pruning (SV-NUP) for LLM compression (Sun et al., 3 May 2025), hybrid SHAP-LLM explainability architectures for human-in-the-loop moderation in ML decision support (Ajayi et al., 25 Nov 2025), and principled extensions of Shapley feature attribution for stochastic LLM inference (Naudot et al., 3 Nov 2025). These approaches share the mathematical foundation of the Shapley value from cooperative game theory and are designed to provide either structural model optimization (via pruning and importance quantification) or transparent, mathematically-grounded explanations for LLM predictions. Methods under the SHAPLLM umbrella demonstrate significant performance and interpretability improvements by leveraging theoretical guarantees and scalable approximation schemes.
1. Shapley Value Foundations and Adaptation to LLMs
The classical Shapley value assigns importance scores to features or components (e.g., transformer layers, input tokens) based on their marginal contributions to a cooperative “game”—in this context, the model’s prediction or performance functional. For a model with elements and a function , the Shapley value for element is: This allocation uniquely satisfies the axioms of efficiency (), symmetry, dummy, and additivity.
For transformer-based LLMs, layers can be treated as “players” (). The value function for pruning is typically defined as , with the perplexity when only layers in are active (Sun et al., 3 May 2025). For explainability, input tokens act as players, and the output function denotes the model’s prediction when only tokens in are “present” (Ajayi et al., 25 Nov 2025, Naudot et al., 3 Nov 2025).
Adapting the Shapley value to LLMs necessitates consideration of stochastic model outputs due to sampling-based decoding. Monte Carlo approximations are employed: for each coalition , the empirical payoff is estimated by averaging model outputs, yielding an expected Shapley value in the limit (Naudot et al., 3 Nov 2025). Under deterministic inference (e.g. temperature ), classic Shapley properties hold exactly.
2. Efficient Shapley Value-based Non-Uniform Pruning (SV-NUP)
SV-NUP targets pruning of large transformer models by using Shapley values to assign non-uniform sparsity ratios to each layer, in contrast to conventional uniform pruning (Sun et al., 3 May 2025). The SV-NUP pipeline comprises:
- Quantifying Layer Contributions: For transformer layer , the exact Shapley value can only be computed for layers via masked inferences, which is infeasible for large .
- Sliding Window-based Approximation (SWSV): Restricts the calculation of each to a window of consecutive layers around , reducing complexity to . The approximate value is:
where is the window and .
- Pruning Budget Assignment: Each layer’s pruning ratio is inversely proportional to its estimated Shapley value , with constraints to prevent extreme allocation:
The actual pruning is then performed with a one-shot method (e.g. SparseGPT) using these budgets.
Empirical Results: On LLaMA-7B and LLaMA-13B at 70% sparsity, SV-NUP achieves 18.01% and 19.55% relative PPL reductions compared to uniform SparseGPT, respectively. Zero-shot task accuracy is improved or maintained. Optimal sliding window sizes or 7 achieve the best trade-off between approximation and compute (Sun et al., 3 May 2025).
| Model | Uniform PPL | SV-NUP PPL | Improvement |
|---|---|---|---|
| LLaMA-7B (70%) | 18.42 | 15.10 | −18.01% |
| LLaMA-13B (70%) | 13.74 | 11.06 | −19.55% |
3. SHAP-LLM Hybrid Explainability for Human-in-the-Loop Moderation
The SHAP-LLM framework couples local SHAP (kernel-based) feature attributions with LLM-generated free-form rationales to provide interpretable, actionable model predictions for tasks such as mental health and cyberbullying classification (Ajayi et al., 25 Nov 2025). The pipeline consists of:
- A text classifier (e.g., fine-tuned transformer MentalBERT).
- A SHAP explainer computing per-token attributions for input .
- An LLM acting as a narrative rationale generator, conditioned on extracted SHAP attributions and classifier output.
Algorithmic Overview:
Given a post :
- Predict class probabilities , predicted class .
- Use SHAP to compute pairs.
- Select top- attributions, assemble a prompt with tokens and their .
- Pass to LLM (e.g., GPT-OSS-20B) to obtain a narrative rationale .
- Render results in a dashboard: highlight top- tokens, display predicted label/confidence/disclaimer, and provide LLM-generated rationale.
Mathematical Details:
Token attributions are normalized to weights . Optional embedding aggregation:
Qualitative Effects:
The interface provides both quantitative (token-level) and qualitative (natural language) explanations, supporting rapid high-risk content triage and enhanced moderator trust (Ajayi et al., 25 Nov 2025). Pseudocode is provided in the original work.
4. Shapley Attribution for Stochastic LLM Decision Support
llmSHAP extends Shapley feature attribution to stochastic LLM inference, investigating the extent to which classical axiomatic properties (efficiency, symmetry, dummy, additivity) hold under non-deterministic model outputs (Naudot et al., 3 Nov 2025). Key insights include:
- Monte Carlo Adaptation: Replace in the Shapley formula by the empirical mean payoff over LLM outputs from coalition .
- Implementation Variants:
- shap: Exact, complexity.
- shap: Cached, .
- shap: Sliding-window, for window size .
- shap: Counterfactual (leave-one-out), .
- Axiomatic Satisfaction:
| Method | Efficiency | Symmetry | Dummy | |-------------|------------|----------|-------| | shap | ✗ | ✓ | ✓ | | shap | ✓ | ✓ | ✓ | | shap | ✗ | ✗ | ✓ | | shap | ✗ | ✓ | ✓ |
Efficiency can fail under stochastic redraws since intermediate terms (coalition outputs) may not cancel. Symmetry and dummy hold conditionally. Monte Carlo stabilization () is recommended for reliability (Naudot et al., 3 Nov 2025).
- Empirical Evaluation:
Sliding-window variants with balance attribution fidelity ($0.85$–$0.90$ cosine similarity to the gold standard) and tractability (1/5th the cost of full-cached Shapley).
5. Interpretability and Practical Implications
All SHAPLLM methodologies derive significant interpretability benefits from their Shapley-theoretic grounding:
- Structured Importance Ranking: For SV-NUP, yields a direct layer ranking, supporting targeted layer ablation, budget assignment for pruning/quantization, or structured removal at finer granularity (e.g., attention heads, submodules) (Sun et al., 3 May 2025).
- Post-hoc Model Understanding: Token- or layer-level attributions can inform model debugging, dynamic inference policies (e.g., early exit), or user-facing explainability modules (Ajayi et al., 25 Nov 2025).
- Human-in-the-Loop Workflows: Narrative rationales generated conditionally on SHAP attributions ensure that flagged decisions remain actionable and interpretable for moderators and domain experts (Ajayi et al., 25 Nov 2025).
Scalability is achieved via sliding-window or leave-one-out strategies, trading off axiomatic exactness for practical deployment in large-scale LLMs (Sun et al., 3 May 2025, Naudot et al., 3 Nov 2025). All frameworks are compatible with arbitrary pruning or classification criteria, encompassing both activation- and gradient-based techniques.
6. Future Directions and Research Opportunities
Future extensions proposed in primary SHAPLLM works include:
- Joint Pruning and Quantization: Integration of Shapley-based budget allocation with quantization pipelines for end-to-end compression (Sun et al., 3 May 2025).
- Fine-grained Attribution: Application of Shapley analysis at the head, neuron, or even sub-token level; supports more precise structured compression and interpretability (Sun et al., 3 May 2025, Naudot et al., 3 Nov 2025).
- Ablation and Early-Exit Studies: Use approximated Shapley values for controlled ablation, dynamic inference, and resource–accuracy trade-offs (Sun et al., 3 May 2025).
- Improved Human-AI Collaboration: Coupling rigorous attribution with LLM-generated rationales, deployed in workflows where interpretability is safety- or efficacy-critical (Ajayi et al., 25 Nov 2025).
- Theoretically Principled Sampling and Approximation: Investigation of more efficient or mathematically justified approximations (e.g., stratified sampling, adaptive windowing) to scale attributions to ever-larger LLMs (Naudot et al., 3 Nov 2025).
SHAPLLM represents a unifying formalism for both pruning and explainability in the LLM domain, balancing the theoretical soundness of the Shapley value with application-driven tractability and transparency.