Papers
Topics
Authors
Recent
Search
2000 character limit reached

Permutation SHAP: Sampling-Based Attribution

Updated 13 May 2026
  • Permutation SHAP is a sampling-based strategy that approximates Shapley value attributions with theoretical guarantees and explicit permutation sampling.
  • It employs methods like antithetic sampling, DOE schemes, and quasi-Monte Carlo techniques to reduce estimator variance and improve efficiency.
  • The approach enables robust global feature selection and sequential attribution, enhancing model-agnostic interpretability in complex scenarios.

Permutation Sampling, also known in the SHAP literature as Permutation SHAP or PermutationSHAP, refers to a family of sampling-based approximations and theoretical refinements of (Shapley value) feature importance estimators that rely on explicit sampling from permutations, either over features or over data entries. This technique underlies both theoretical guarantees about global feature importance and practical improvements in estimation efficiency for model-agnostic and black-box attribution frameworks.

1. Mathematical Formulation and Soundness

The basis of Permutation SHAP is the permutation-based definition of the Shapley value, which assigns to feature ii the mean marginal contribution of including ii across all possible orderings of input features:

ϕi=1d!∑π∈Sd[f(xSπ,i∪{i})−f(xSπ,i)],\phi_i = \frac{1}{d!} \sum_{\pi \in S_d} \bigl[ f(x_{S_{\pi,i} \cup \{i\}}) - f(x_{S_{\pi,i}}) \bigr],

where Sπ,iS_{\pi,i} is the set of features preceding ii in permutation π\pi and dd is the number of features (Yang et al., 2023, Mitchell et al., 2021).

In classical SHAP implementations, feature contributions are aggregated over observed data—sampling x∼μx \sim \mu, the joint data distribution. However, this can produce unsound global feature importance: aggregate SHAP values can be small for features that the function genuinely depends on, due to the influence of points lying outside the data manifold (Bhattacharjee et al., 29 Mar 2025).

Permutation SHAP corrects this by using the extended distribution μ∗=μ1×...×μd\mu^* = \mu_1 \times ... \times \mu_d, obtained as the product of feature marginals. By independently permuting each column of the data matrix, one samples from μ∗\mu^*. Aggregating SHAP values over ii0 yields the extended-support aggregate SHAP: ii1 Theoretical results show that

  • ii2 is independent of ii3 on ii4 if and only if ii5 for all ii6 (exact soundness).
  • If ii7, then ii8 can be approximated (in ii9) by a function Ï•i=1d!∑π∈Sd[f(xSÏ€,i∪{i})−f(xSÏ€,i)],\phi_i = \frac{1}{d!} \sum_{\pi \in S_d} \bigl[ f(x_{S_{\pi,i} \cup \{i\}}) - f(x_{S_{\pi,i}}) \bigr],0 independent of Ï•i=1d!∑π∈Sd[f(xSÏ€,i∪{i})−f(xSÏ€,i)],\phi_i = \frac{1}{d!} \sum_{\pi \in S_d} \bigl[ f(x_{S_{\pi,i} \cup \{i\}}) - f(x_{S_{\pi,i}}) \bigr],1 with error Ï•i=1d!∑π∈Sd[f(xSÏ€,i∪{i})−f(xSÏ€,i)],\phi_i = \frac{1}{d!} \sum_{\pi \in S_d} \bigl[ f(x_{S_{\pi,i} \cup \{i\}}) - f(x_{S_{\pi,i}}) \bigr],2.

This ensures that small Permutation SHAP means aggregate theorems justify safe feature elimination, whereas classical SHAP does not possess this guarantee (Bhattacharjee et al., 29 Mar 2025).

2. Permutation Sampling Algorithms

Permutation SHAP estimation consists of two principal sampling operations, depending on the attribution goal:

A. Extended-Support Aggregation (Permutation over Data):

  1. Given ϕi=1d!∑π∈Sd[f(xSπ,i∪{i})−f(xSπ,i)],\phi_i = \frac{1}{d!} \sum_{\pi \in S_d} \bigl[ f(x_{S_{\pi,i} \cup \{i\}}) - f(x_{S_{\pi,i}}) \bigr],3, independently permute each column, forming ϕi=1d!∑π∈Sd[f(xSπ,i∪{i})−f(xSπ,i)],\phi_i = \frac{1}{d!} \sum_{\pi \in S_d} \bigl[ f(x_{S_{\pi,i} \cup \{i\}}) - f(x_{S_{\pi,i}}) \bigr],4.
  2. Compute SHAP (e.g., KernelSHAP) on ϕi=1d!∑π∈Sd[f(xSπ,i∪{i})−f(xSπ,i)],\phi_i = \frac{1}{d!} \sum_{\pi \in S_d} \bigl[ f(x_{S_{\pi,i} \cup \{i\}}) - f(x_{S_{\pi,i}}) \bigr],5, aggregating absolute values per feature.
  3. The result approximates ϕi=1d!∑π∈Sd[f(xSπ,i∪{i})−f(xSπ,i)],\phi_i = \frac{1}{d!} \sum_{\pi \in S_d} \bigl[ f(x_{S_{\pi,i} \cup \{i\}}) - f(x_{S_{\pi,i}}) \bigr],6 and provides sound global feature importance (Bhattacharjee et al., 29 Mar 2025).

B. PermutationSHAP for Exact/Approximate Shapley Values (Permutation over Features):

  • Draw Ï•i=1d!∑π∈Sd[f(xSÏ€,i∪{i})−f(xSÏ€,i)],\phi_i = \frac{1}{d!} \sum_{\pi \in S_d} \bigl[ f(x_{S_{\pi,i} \cup \{i\}}) - f(x_{S_{\pi,i}}) \bigr],7 random permutations of the feature set.
  • For each permutation Ï•i=1d!∑π∈Sd[f(xSÏ€,i∪{i})−f(xSÏ€,i)],\phi_i = \frac{1}{d!} \sum_{\pi \in S_d} \bigl[ f(x_{S_{\pi,i} \cup \{i\}}) - f(x_{S_{\pi,i}}) \bigr],8 and each feature Ï•i=1d!∑π∈Sd[f(xSÏ€,i∪{i})−f(xSÏ€,i)],\phi_i = \frac{1}{d!} \sum_{\pi \in S_d} \bigl[ f(x_{S_{\pi,i} \cup \{i\}}) - f(x_{S_{\pi,i}}) \bigr],9, compute marginal contributions SÏ€,iS_{\pi,i}0.
  • Aggregate over permutations to estimate SÏ€,iS_{\pi,i}1 (Mayer et al., 18 Aug 2025, Yang et al., 2023, Mitchell et al., 2021).

Advanced variants employ antithetic or paired sampling (using reverse permutations) for variance reduction and exact recovery in certain models (bilinear interactions, additive decompositions) (Mayer et al., 18 Aug 2025).

Fractional factorial (DOE) methods such as Component Orthogonal Arrays (COA) or Latin Squares achieve structured, balanced coverage over permutation space, resulting in unbiased, lower-variance estimators compared to simple Monte Carlo sampling (Yang et al., 2023).

3. Variance Reduction and Quasi-Monte Carlo Techniques

Exploiting structure in the space of permutations leads to significantly improved convergence rates and estimator variance:

  • Paired sampling: Pair each permutation SÏ€,iS_{\pi,i}2 with its reversal. For bilinear or additive value functions, a single paired-sample recovers SÏ€,iS_{\pi,i}3 exactly. For general functions, variance is halved compared to unpaired sampling (Mayer et al., 18 Aug 2025).
  • Order-of-addition designs (DOE): Structured designs (COA, Latin square) ensure each feature occurs equally often in each position, which can reduce variance by factors of 2–5, and sometimes to zero for symmetric or position-only games (Yang et al., 2023).
  • Kernel quadrature and herding: Functions of permutations can be embedded in a Mallows-RKHS; kernel herding and sequential Bayesian quadrature provide residual error bounds significantly outperforming IID Monte Carlo for small to moderate dimensions (Mitchell et al., 2021).
  • Sobol and orthogonal spherical codes: By mapping permutations to equally spaced points on a hypersphere and leveraging low-discrepancy sets, quasi-Monte Carlo approximations ensure uniform coverage and rapid error decay for high-dimensional regimes (Mitchell et al., 2021).

These methods achieve better estimate precision per model evaluation and are robust to variance spikes inherent to MC with random permutations.

4. Theoretical Properties and Algebraic Structure

Permutation SHAP admits a rigorous operator-theoretic characterization:

  • The SHAP operator SÏ€,iS_{\pi,i}4 (acting on SÏ€,iS_{\pi,i}5, the space of measurable functions over SÏ€,iS_{\pi,i}6) is such that SÏ€,iS_{\pi,i}7 iff SÏ€,iS_{\pi,i}8 is independent of SÏ€,iS_{\pi,i}9 (on ii0) (Bhattacharjee et al., 29 Mar 2025).
  • The algebra generated by value operators (ii1), termed the Shapley Lie algebra, is solvable and can be triangularized. This leads to explicit invertibility and approximation arguments underlying the soundness of Permutation SHAP (Bhattacharjee et al., 29 Mar 2025).

Robustness bounds establish that if the aggregate Permutation SHAP value of a feature is small, there exists a feature-independent surrogate for ii2 with ii3 error controlled by ii4.

5. Extensions to Sequential and Non-i.i.d. Settings

Permutation SHAP has been adapted for sequential or position-sensitive models (e.g., natural language, time series) via algorithms such as OrdShap (Hill et al., 16 Jul 2025):

  • OrdShap introduces a matrix ii5 capturing both value and position effects by averaging marginal contributions over all subsets and permutations conditioned on feature ii6 occupying position ii7.
  • The average over positions (ii8) recovers traditional value importance, while a linear fit over positions yields position importance (ii9).
  • The OrdShap permutation sampling scheme draws random subsets and random position assignments, using matching and masking to estimate the contribution of feature value and order.

This extension enables attribution methods to distinguish between the informativeness of feature values and their locations, an axis conflated in classical SHAP or permutation averaging approaches.

6. Practical Implementation and Guidelines

Empirical studies and theoretical analyses yield clear practical recommendations:

  • For global feature selection and interpretability, permute each feature (column-wise) independently across data rows and apply KernelSHAP or similar on the scrambled matrix to obtain robust aggregate SHAP. Retain features with Ï€\pi0 significantly greater than zero (Bhattacharjee et al., 29 Mar 2025).
  • For sample complexity, the cost and convergence of Permutation SHAP match that of ordinary KernelSHAP (Ï€\pi1 in worst-case), but with stronger guarantees on interpretability and invariance.
  • For high-dimensional problems or when model evaluation is expensive, leverage COA or Latin Square DOE schemes to maximize the information gained per permutation and reduce estimator variance (Yang et al., 2023).
  • In applications to sequential data, employ OrdShap to attribute both value and positional importance, disentangling effects that standard Permutation SHAP conflates (Hill et al., 16 Jul 2025).

7. Summary Table of Main Permutation SHAP Methods

Method Class Key Feature Main Reference
Permuted data matrix Column-wise permutation (μ*) (Bhattacharjee et al., 29 Mar 2025)
Monte Carlo (feature perm.) Random/unpaired, antithetic, paired (Mayer et al., 18 Aug 2025, Mitchell et al., 2021)
DOE (COA/Latin Square) Structured permutation sampling (Yang et al., 2023)
Kernel Herding/SBQ Mallows RKHS quadrature (Mitchell et al., 2021)
OrdShap Value vs position in sequences (Hill et al., 16 Jul 2025)

Permutation SHAP thus underpins both recent theoretical advances in global attribution soundness and a suite of principled, flexible estimation strategies that address the computational and statistical challenges inherent to black-box feature importance in modern machine learning.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Permutation Sampling (Permutation SHAP).