Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sparse Isotonic Shapley Regression (SISR)

Updated 4 December 2025
  • SISR is a framework that restores additivity in Shapley values through a learned monotonic transformation of coalition payoffs.
  • It alternates between isotonic regression and normalized hard-thresholding to enforce exact ℓ0 sparsity in attributions.
  • Empirical evaluations show that SISR delivers stable and interpretable explanations even with heavy-tailed, dependent features.

Sparse Isotonic Shapley Regression (SISR) addresses two foundational obstacles in Shapley-based model explainability: the violation of the canonical additivity assumption in real-world payoff constructions, and the need for sparse, interpretable attributions in high-dimensional feature spaces. The method provides a unified framework that simultaneously estimates a monotonic, data-driven transformation to restore additivity of the worth function and enforces an 0\ell_0 sparsity constraint on the Shapley vector, thereby yielding more consistent and computationally efficient explanations in settings plagued by non-Gaussianity, heavy tails, feature dependence, or idiosyncratic loss scales (She, 2 Dec 2025).

1. Model Formulation and Objectives

SISR considers a family of coalition values {νA:A{1,,p}}\{\nu_A : A\subseteq\{1,\ldots,p\}\} and seeks to jointly: (i) learn a strictly increasing scalar function f:RRf:\mathbb{R}\to\mathbb{R} with f(0)=0f(0)=0, such that f(νA)f(\nu_A) becomes additive in the underlying Shapley scores; (ii) enforce sparsity (0\ell_0 constraint) on the Shapley vector φ\varphi, aligning with the hypothesis that only a small subset of features is truly relevant.

The model assumes that for all coalitions AA,

f(νA)jAφj+noiseA,noiseAN(0,σA2)f(\nu_A) \approx \sum_{j\in A} \varphi_j^* + \text{noise}_A, \quad \text{noise}_A \sim N(0,\sigma_A^2)

Enforcing φ0s\|\varphi\|_0\le s directly achieves exact sparsity in attributions, bypassing the shrinkage-bias inherent to 1\ell_1 penalization. The primary SISR objective is:

minφRp,fM12A[p]wSH(A)[f(νA)jAφj]2 subject to φ2=1, φ0s, f strictly increasing\min_{\varphi\in\mathbb{R}^p,\,f\in\mathcal{M}} \tfrac{1}{2} \sum_{A\subseteq[p]} w_{\mathrm{SH}}(A) \left[f(\nu_A) - \sum_{j\in A} \varphi_j \right]^2\ \text{subject to}~ \|\varphi\|_2=1,~ \|\varphi\|_0\le s,~ f~\text{strictly increasing}

Reparameterization with γj=φj\gamma_j = \varphi_j and ti=f(νAi)t_i = f(\nu_{A_i}) (discrete), and introducing matrix ZZ (rows index coalitions), yields the equivalent quadratic problem with order restrictions on tt.

2. Optimization Algorithm

SISR employs alternating minimization over γ\gamma (sparse Shapley vector) and tt (discretized monotonic transform):

A. Monotonic Transformation via Isotonic Regression:

With γ\gamma fixed, update tt by solving a weighted isotonic regression problem:

mint 12tZγW2 subject to order: titj whenever νiνj\min_{t}~ \tfrac{1}{2}\| t - Z\gamma \|_W^2~\text{subject to order: } t_i\leq t_j \text{ whenever } \nu_i \leq \nu_j

This is executed exactly using the Pool-Adjacent-Violators (PAV) algorithm with computational complexity O(N)O(N), for N=2pN=2^p (often reduced via subsampling). The output vector tt provides the transformed payoffs at observed coalition values.

B. Sparsity via Normalized Hard-Thresholding:

With tt fixed, update γ\gamma by minimizing the objective under 2/0\ell_2/\ell_0 constraints. The step utilizes a quadratic surrogate g(γ;γ)g(\gamma;\gamma^{-}) and a hard-thresholding operator H(y;s)=H(y;s)/H(y;s)2H^{\circ}(y;s) = H(y;s)/\|H(y;s)\|_2, where HH zeros all but the ss largest yj|y_j|. The global minimizer takes the form:

γ+=H(γ1ρl(γ);s)\gamma^+ = H^{\circ}\left(\gamma^- - \tfrac{1}{\rho} \nabla l(\gamma^-); s\right)

where l(γ)=12ZγtW2l(\gamma) = \tfrac{1}{2}\|Z\gamma-t\|_W^2, ρZTWZ2\rho\ge\|Z^TWZ\|_2.

Algorithm Sketch:

  1. Initialize γ\gamma as unit-norm ss-sparse vector, t=Cνt = C\nu; precompute ρ\rho.
  2. Alternate: (a) update γ\gamma by normalized hard-thresholding; (b) update tt with weighted PAVA.
  3. Continue until objective stabilizes.

Convergence properties are established: each γ\gamma-update reduces objective, the sequence converges, and gaps vanish if ρ\rho is strictly greater than ZTWZ2\|Z^TWZ\|_2.

3. Theoretical Guarantees and Analysis

Under the generative model f(νA)=jAφj+εAf^*(\nu_A) = \sum_{j\in A} \varphi_j^* + \varepsilon_A, with 0\ell_0-sparsity of φ\varphi^* and εAN(0,σA2)\varepsilon_A\sim N(0, \sigma_A^2), the following results are obtained:

  • Consistency: As noise vanishes or sample size of coalition values increases, the isotonic regression step recovers the true monotonic transformation ff^* up to negligible error.
  • Support Recovery: Provided the minimum nonzero φj|\varphi_j^*| is substantially larger than the noise and the signal-to-noise ratio φ2/σ\|\varphi^*\|_2/\sigma is large, the normalized hard-thresholding step identifies the correct support with high probability.
  • Feature Dependence and Heavy-Tailed Payoffs: SISR learns a transform ff that absorbs the nonlinearities, restoring additivity even for payoff structures arising from feature dependence, heavy tails, or domain-specific loss constructions (e.g., R2R^2 under correlated regressors). The canonical linear-Gaussian assumption underlying standard Shapley can otherwise suffer severe violations leading to distorted attributions.

4. Empirical Performance

Comprehensive experiments on synthetic and real data characterize SISR performance:

  • Synthetic Recovery: For a battery of toy monotonic transforms (e.g., square root, fifth root, logarithm, exponential, tanh-like, Gaussian CDF-odds), SISR accurately fits ff^* (correlation r>0.99r>0.99).
  • Sparse Support Recovery: For synthetic φ\varphi^* with s=3s^*=3, pp up to 25, and noise up to σ0=0.2\sigma_0=0.2, both support agreement and affinity φ^,φ\langle \hat{\varphi}, \varphi^*\rangle remain near 1. Recovery degrades gracefully with noise; e.g., for p=25,σ0=0.1p=25, \sigma_0=0.1, true support is recovered approximately 90% of the time.
  • R2R^2 Payoff Scenarios: Simulations with R2R^2 payoff under feature correlation/irrelevance demonstrate that raw R2R^2 is not additive; SISR fits a correction, delivering stable attributions, while standard Shapley responses become highly nonrobust.
  • Real Data Applications:
    • Prostate dataset: Shapley values exaggerate the importance of “svi”; SISR zeros it out, aligning with AIC/BIC and expert knowledge.
    • Boston housing: With robust payoff exp(MSE)\exp(-\text{MSE}), baseline Shapley values flip important variable signs; SISR’s log-like transform yields stable rankings.
    • Credit and Pima diabetes: Across diverse payoff schemes, SISR provides consistent sparse attributions (selected via RIC), whereas standard SAGE/Shapley attributions are unstable.

5. Computational and Practical Considerations

  • Complexity: Weighted PAVA is O(N)O(N) per iteration. When utilizing sparse storage for ZZ (each coalition row p/2\approx p/2 nonzeros), γ\gamma-updates cost O(Ns)O(N\cdot s) per iteration. Convergence is typically reached within tens of alternations.
  • Sparsity Level Selection: If ss^* is unknown, Risk Inflation Criterion (RIC) can be used to choose ss. Empirical evidence indicates reliable selection.
  • Initialization: γ(0)\gamma^{(0)} as the top-ss entries of ZTWt(0)Z^T W t^{(0)} (normalized) and t(0)=Cνt^{(0)}=C\nu (with large CC for numerical stability) are recommended.
  • Infinite Weights for Special Coalitions: To ensure f(0)=0f(0)=0 and the “efficiency” constraint, theoretical Shapley weights wSH()w_{\text{SH}}(\emptyset) and wSH([p])w_{\text{SH}}([p]) diverge; in practice, substituting large multiples (e.g., ×\times10 or ×\times100) of the maximum weight is effective.
  • Extensibility: Alternating PAVA plus hard-thresholding can extend to other convex loss functions (e.g., GLMs with canonical links) by substituting the quadratic surrogate with a suitable local approximation.

6. Significance and Implications in Explainable AI

SISR advances the Shapley explanation paradigm by jointly resolving two stated limitations: First, it restores additivity in the presence of nonadditive, domain-specific, or statistically problematic payoff functions via a learned monotonic transformation. Second, it provides exact 0\ell_0-sparse attributions in high-dimensional settings, overcoming the inconsistency and computational burden of post hoc thresholding.

Empirical evidence demonstrates that SISR explanations are both more stable and faithful to domain structure—eliminating attributions to irrelevant features and correcting severe rank/sign distortions observed in conventional Shapley or SAGE values under nonlinear or unstable payoffs.

A plausible implication is that SISR is well-positioned as a general-purpose attribution framework for scenarios involving heavy tails, feature dependence, or overarching nonlinearity in payoff structure, where standard linear Shapley values are provably unreliable (She, 2 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sparse Isotonic Shapley Regression (SISR).