SHAPLLM Explainability Framework

Updated 2 December 2025

SHAPLLM is an interpretability framework that combines SHAP value computation with LLM-generated natural language narratives to produce accessible and numerically accurate explanations.
It employs a three-stage pipeline—SHAP computation using XGBoost and the shap library, structured formatting into JSON-like objects, and LLM-based narrative generation—to ensure clarity and fidelity.
Evaluations indicate that SHAPLLM improves user satisfaction and readability while maintaining near-perfect fidelity in feature importance ranking for high-stakes model interpretations.

SHAPLLM is an interpretability framework that integrates SHAP (SHapley Additive exPlanations) value computation with LLM–generated natural language narratives. It is designed to increase the usability and clarity of feature attribution explanations produced by complex machine learning models, particularly for end users lacking technical expertise, while maintaining strict numeric fidelity to the underlying SHAP values (Zeng, 2024). The framework leverages the precise, game-theoretic underpinnings of SHAP and the generative fluency of modern LLMs to bridge the gap between technical model explanations and non-technical interpretability needs.

1. SHAP Value Foundations

SHAP values provide an additive, axiomatic mechanism for attributing a model’s prediction to individual input features, derived from the Shapley value concept in cooperative game theory. For an input feature set $N$ and a model output $f(x)$ , the SHAP value $\phi_i$ assigned to feature $i$ is

$\phi_i = \sum_{S \subseteq N \setminus \{i\}} \frac{|S|!\,(|N|-|S|-1)!}{|N|!}\left[ f_{S\cup\{i\}} \left(x_{S\cup\{i\}}\right) - f_S\left(x_S\right) \right]$

where $S$ represents all subsets of features excluding $i$ , and $f_S(x_S)$ is the model’s output with only features in $S$ "present" (others marginalized or set to baseline). The sum of all SHAP values plus the expected model output reconstructs the actual prediction:

$f(x) = \mathbb{E}[f(x)] + \sum_{i=1}^{|N|} \phi_i$

Thus, each SHAP value explicitly quantifies the contribution of one feature to the deviation of the instance’s output from baseline (Zeng, 2024).

2. Pipeline Architecture and Workflow

SHAPLLM operationalizes model interpretability via a three-stage pipeline:

SHAP Computation: An XGBoost model is trained and the Python “shap” library is used to compute instance-specific local SHAP values $\{\phi_1, \dots, \phi_n\}$ . The model’s expected baseline $\mathbb{E}[f(x)]$ and output $f(x)$ are also recorded.
Structured Formatting: Each feature-contribution tuple is encapsulated as a JSON-like object for LLM consumption, e.g., {"feature": "Age", "value": 24, "shap": -0.15}. All values are rounded to three decimal places to balance accuracy and readability.
LLM-Based Narrative Generation: Structured tuples, together with a baseline/prediction summary block, are incorporated into a prompt. The prompt task is to generate "plain-language explanations" that are numerically faithful to the SHAP attributions. Prompts are engineered with templates requiring the LLM to explicitly reference the feature, its value, the direction of effect, and the SHAP value magnitude.

This architecture ensures that the LLM produces explanations with preserved quantitative details of the original SHAP decomposition (Zeng, 2024).

3. LLM Integration

SHAPLLM employs a pre-trained Mistral 7B LLM, deployed locally via Hugging Face Transformers. The model is guided by prompts structured as follows:

The prompt block begins with a sentence specifying the model baseline ( $\mathbb{E}[f(x)]$ ) and the prediction ( $f(x)$ ).
Individual feature-summary entries are listed by descending $|\phi_i|$ .
Prompt templates reinforce narrative clarity, requiring that each feature’s contribution direction (positive/negative), the feature value, and the SHAP magnitude all be described explicitly.

Instruction tuning is performed solely via temperature/max-length adjustments (temperature: 0.2–0.7, tokens: 150–300) and a set of hand-crafted prompt–output pairs included as in-context examples, rather than full domain fine-tuning (Zeng, 2024).

4. Illustrative Example: Application to the Titanic Dataset

An applied demonstration on the Titanic dataset proceeds as follows:

Model baseline: $0.38$
Model prediction: $0.81$
SHAP decomposition:
- $\phi_{\mathrm{Sex}}$ ("female") = $+0.45$
- $\phi_{\mathrm{Pclass}}$ (1) = $+0.32$
- $\phi_{\mathrm{Age}}$ (38) = $-0.12$
- $\phi_{\mathrm{Fare}}$ (71.28) = $+0.08$

The LLM prompt specifies the full numerical context and enumerates features and their SHAP values. The resulting LLM-generated explanation is:

“The prediction is substantially raised because the passenger is female (+0.45), reflecting historically higher survival rates for women. Traveling in first class further boosts the chance (+0.32). Being 38 years old subtracts a small amount (–0.12), as older age slightly reduced survival probability. Paying a high fare adds a modest increase (+0.08), consistent with wealthier passengers having better access to lifeboats.”

This output preserves quantitative attribution and narrative interpretability (Zeng, 2024).

5. Evaluation Methodology and Performance

Three criteria were used to evaluate SHAPLLM outputs against raw SHAP tables:

Metric	Raw SHAP Output	LLM Narrative
Readability	15.2	9.1
Fidelity (Spearman ρ)	≈1.00	0.98
User Satisfaction	2.1	4.3

Readability: Measured by Flesch-Kincaid grade level, LLM narratives significantly improved comprehensibility (15.2 → 9.1).
Fidelity: Preservation of ranked feature importance, as measured by Spearman correlation, was near-perfect (ρ = 0.98).
User Satisfaction: A study with 20 non-technical users reported substantially higher satisfaction (2.1 → 4.3 on a 5-point scale).

This demonstrates that SHAPLLM can significantly increase explanation accessibility without sacrificing numerical trustworthiness (Zeng, 2024).

6. Limitations, Risks, and Future Directions

Several limitations are identified:

Prompt Sensitivity: Explanation quality depends heavily on precise prompt template design.
Compute Demand: Running a 7B-parameter model locally, even for moderate tasks, requires substantial GPU resources.
Over-Reliance: There exists a risk that non-technical users might accept the LLM’s narrative uncritically, missing underlying model biases.

Future development targets include:

Multimodal Explanations: Augmenting LLM text with visual representations such as bar charts or feature-importance heatmaps.
Interactive User Feedback: Mechanisms for users to rate or correct explanations, providing signals for iterative prompt template refinement or LLM fine-tuning.
Domain-Specific Fine-Tuning: Systematically collecting paired SHAP–explanation datasets in critical deployment contexts (e.g., healthcare) to enhance contextual relevance.
New Usability Metrics: Proposing composite metrics integrating readability, fidelity, and decision-making accuracy for future benchmarking.

A plausible implication is that SHAPLLM serves as a template for integrating principled feature attribution (SHAP) with modern generative models, supporting transparent deployment in high-stakes environments (Zeng, 2024).

Markdown Upgrade to Chat

References (1)

Enhancing the Interpretability of SHAP Values Using Large Language Models (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SHAPLLM Explainability Framework.