Papers
Topics
Authors
Recent
2000 character limit reached

SHAP-Based Explainability in AI

Updated 31 December 2025
  • SHAP-based explainability is a method that leverages Shapley values to decompose model predictions into additive feature contributions.
  • The method is highly sensitive to feature representation and encoding choices, which can significantly alter the magnitude and ranking of feature importances.
  • Robust practices such as auditing feature engineering and using invariant explanation pipelines are essential to mitigate adversarial risks and ensure reliable model assessments.

SHAP-based explainability refers to a family of methods grounded in cooperative game theory, specifically the Shapley value, for attributing model predictions to individual input features. By averaging the marginal contributions of each feature across all possible coalitions, SHAP provides a unique, additive decomposition of a model’s output into feature “importances,” supporting both local (instance-level) and global (dataset-level) explanation. In recent years, the SHAP framework has become central to eXplainable Artificial Intelligence (XAI), but new research demonstrates that explanation quality and integrity are deeply sensitive to choices in feature representation, encoding, and data engineering (Hwang et al., 13 May 2025). This property poses significant challenges and even security risks for model auditing, fairness assessment, and reliable deployment.

1. Mathematical Foundations of SHAP

The SHAP value for feature ii in a prediction f(x)f(x) is formalized as:

ϕi(x;f)=SF{i}S!(FS1)!F![fS{i}(xS{i})fS(xS)]\phi_i(x;f) = \sum_{S\subseteq F\setminus\{i\}} \frac{|S|! (|F|-|S|-1)!}{|F|!} \left[ f_{S\cup\{i\}}(x_{S\cup\{i\}}) - f_S(x_S) \right]

where:

  • FF is the full set of interpretable features,
  • fS(xS)f_S(x_S) is the expected output of the model when only features in SS are held at the observed values (others marginalized, typically using a background dataset),
  • the weighting term ensures a fair average over all feature inclusion orders.

The efficiency property ensures that iϕi=f(x)E[f(x)]\sum_{i}\phi_i = f(x) - E[f(x')], so the sum of feature attributions exactly recovers the model’s deviation from a reference baseline. This mathematical formalism guarantees local accuracy, symmetry, dummy, and additivity—making SHAP the unique solution under these axioms (Salih et al., 2023).

2. The Impact of Feature Representation

SHAP explanations are not invariant to the choice of feature representation. Any preprocessing, encoding, or transformation of input features alters the “interpretable feature” space on which SHAP is computed. Examples include:

  • Continuous features: “Age” can be used raw, discretized into equi-width or equi-depth bins, or one-hot encoded across intervals.
  • Categorical features: “Race” can be one-hot encoded per category, merged into coarse groups, ordinally coded, or target encoded.

This affects both the shape of xx and the definition of FF, directly modifying the coalitions SS considered in the Shapley sum. Empirically, such changes can shift the magnitude and even the sign of marginal contributions. For instance, converting age from continuous to buckets can shrink ϕage|\phi_{\text{age}}| or flip its importance ranking (Hwang et al., 13 May 2025). In categorical cases, merging race categories can systematically diminish or obscure the apparent effect of race.

Concretely, suppose a raw “age=30” yields fedu,age(x)fedu(x)=0.3f_\text{edu,age}(x)-f_\text{edu}(x)= -0.3 but its bucketed version “age_bin=3” yields 0.02-0.02; the aggregation over coalitions amplifies such shifts across the explanation.

3. Empirical Sensitivity and Adversarial Vulnerability

Experimental work demonstrates that SHAP explanations, produced via TreeSHAP on XGBoost classifiers, are substantially influenced by feature encoding. Findings on census datasets (Hwang et al., 13 May 2025):

  • Continuous (Age): As the number of buckets KK increases, both the average absolute value ϕage|\phi_{\text{age}}| and its importance rank rise. With K=12K=12, “age” often becomes the top-ranked feature, while bucketizing demotes it in up to 60% of instances where it was previously top-ranked. Rank shifts up to ±20\pm 20 positions are observed depending on binning.
  • Categorical (Race): Under base one-hot race encoding, “race” is the top feature in 12% of cases; merging categories can drop this to under 1%. Corresponding mean ϕrace|\phi_{\text{race}}| values may fall by an order of magnitude.

This sensitivity can be exploited. A malicious actor needing to pass a model audit can re-encode features (e.g., bucketize age or aggregate race groups) to minimize the apparent importance of protected or sensitive attributes without retraining—reducing, for example, SHAP-based “age” rank from $2.3$ to $7.1$ via a 4-bin Bayesian optimization adversarial attack, while maintaining >>90% fidelity in explanation (Hwang et al., 13 May 2025).

4. Recommendations and Best Practices for Reliable SHAP

Given the non-invariance of SHAP to feature representation, practitioners should:

  • Audit feature engineering: Treat the transformation pipeline as an explicit artifact in model and explanation audits; all encoding choices must be logged and justified before generating explanations.
  • Report across encodings: Show SHAP results under multiple reasonable transformations (e.g., both raw and discretized age) to detect instability or manipulation.
  • Prefer invariant or causal methods: Where feasible, use explanation frameworks or causal attributions that do not depend on arbitrary discretizations.
  • Immutable explainer pipelines: For regulated domains, enforce locked explainability workflows from raw data to explanation, safeguarding against downstream tampering.
  • Robust SHAP variants: Develop versions where ϕi\phi_i is provably stable under small representation perturbations (e.g. by regularizing for Lipschitz continuity).

In sum, the integrity of SHAP explanations depends not only on the predictive model but critically on the pre-explanation feature engineering. Routine encoding steps—histogram binning, category merging—can amplify, mask, or even reverse the apparent importance of key features. Consequently, feature representation must be a first-class object in audits, compliance reviews, and research into robust and trustworthy XAI (Hwang et al., 13 May 2025).

5. Broader Implications and Future Directions

The sensitivity of SHAP to interpretable representation foregrounds several ongoing research questions:

  • Theoretical generalizations: Understanding the mathematical conditions under which SHAP can be made representation-invariant or when it can be coupled with causal inference frameworks.
  • Defending against attacks: Developing algorithmic or statistical bounds for SHAP stability, and diagnostic tests for adversarial feature engineering.
  • Fairness and transparency: Ensuring that manipulation of apparent feature importance via encoding cannot be used to evade regulatory scrutiny, especially for discrimination auditing in algorithmic decision-making.
  • Standardization: Codifying best practices for encoding choices and explanation pipelines in open-source libraries and regulatory standards.

Current evidence makes clear that robust explainability cannot rely solely on the output of feature attribution methods—review of feature representation and transformation is essential to maintain fidelity, fairness, and trust in SHAP-based XAI methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to SHAP-Based Explainability.