Sparse Isotonic Shapley Regression (SISR)

Updated 4 December 2025

SISR is a framework that restores additivity in Shapley values through a learned monotonic transformation of coalition payoffs.
It alternates between isotonic regression and normalized hard-thresholding to enforce exact ℓ0 sparsity in attributions.
Empirical evaluations show that SISR delivers stable and interpretable explanations even with heavy-tailed, dependent features.

Sparse Isotonic Shapley Regression (SISR) addresses two foundational obstacles in Shapley-based model explainability: the violation of the canonical additivity assumption in real-world payoff constructions, and the need for sparse, interpretable attributions in high-dimensional feature spaces. The method provides a unified framework that simultaneously estimates a monotonic, data-driven transformation to restore additivity of the worth function and enforces an $\ell_0$ sparsity constraint on the Shapley vector, thereby yielding more consistent and computationally efficient explanations in settings plagued by non-Gaussianity, heavy tails, feature dependence, or idiosyncratic loss scales (She, 2 Dec 2025).

1. Model Formulation and Objectives

SISR considers a family of coalition values $\{\nu_A : A\subseteq\{1,\ldots,p\}\}$ and seeks to jointly: (i) learn a strictly increasing scalar function $f:\mathbb{R}\to\mathbb{R}$ with $f(0)=0$ , such that $f(\nu_A)$ becomes additive in the underlying Shapley scores; (ii) enforce sparsity ( $\ell_0$ constraint) on the Shapley vector $\varphi$ , aligning with the hypothesis that only a small subset of features is truly relevant.

The model assumes that for all coalitions $A$ ,

$f(\nu_A) \approx \sum_{j\in A} \varphi_j^* + \text{noise}_A, \quad \text{noise}_A \sim N(0,\sigma_A^2)$

Enforcing $\|\varphi\|_0\le s$ directly achieves exact sparsity in attributions, bypassing the shrinkage-bias inherent to $\ell_1$ penalization. The primary SISR objective is:

$\min_{\varphi\in\mathbb{R}^p,\,f\in\mathcal{M}} \tfrac{1}{2} \sum_{A\subseteq[p]} w_{\mathrm{SH}}(A) \left[f(\nu_A) - \sum_{j\in A} \varphi_j \right]^2\ \text{subject to}~ \|\varphi\|_2=1,~ \|\varphi\|_0\le s,~ f~\text{strictly increasing}$

Reparameterization with $\gamma_j = \varphi_j$ and $t_i = f(\nu_{A_i})$ (discrete), and introducing matrix $Z$ (rows index coalitions), yields the equivalent quadratic problem with order restrictions on $t$ .

2. Optimization Algorithm

SISR employs alternating minimization over $\gamma$ (sparse Shapley vector) and $t$ (discretized monotonic transform):

A. Monotonic Transformation via Isotonic Regression:

With $\gamma$ fixed, update $t$ by solving a weighted isotonic regression problem:

$\min_{t}~ \tfrac{1}{2}\| t - Z\gamma \|_W^2~\text{subject to order: } t_i\leq t_j \text{ whenever } \nu_i \leq \nu_j$

This is executed exactly using the Pool-Adjacent-Violators (PAV) algorithm with computational complexity $O(N)$ , for $N=2^p$ (often reduced via subsampling). The output vector $t$ provides the transformed payoffs at observed coalition values.

B. Sparsity via Normalized Hard-Thresholding:

With $t$ fixed, update $\gamma$ by minimizing the objective under $\ell_2/\ell_0$ constraints. The step utilizes a quadratic surrogate $g(\gamma;\gamma^{-})$ and a hard-thresholding operator $H^{\circ}(y;s) = H(y;s)/\|H(y;s)\|_2$ , where $H$ zeros all but the $s$ largest $|y_j|$ . The global minimizer takes the form:

$\gamma^+ = H^{\circ}\left(\gamma^- - \tfrac{1}{\rho} \nabla l(\gamma^-); s\right)$

where $l(\gamma) = \tfrac{1}{2}\|Z\gamma-t\|_W^2$ , $\rho\ge\|Z^TWZ\|_2$ .

Algorithm Sketch:

Initialize $\gamma$ as unit-norm $s$ -sparse vector, $t = C\nu$ ; precompute $\rho$ .
Alternate: (a) update $\gamma$ by normalized hard-thresholding; (b) update $t$ with weighted PAVA.
Continue until objective stabilizes.

Convergence properties are established: each $\gamma$ -update reduces objective, the sequence converges, and gaps vanish if $\rho$ is strictly greater than $\|Z^TWZ\|_2$ .

3. Theoretical Guarantees and Analysis

Under the generative model $f^*(\nu_A) = \sum_{j\in A} \varphi_j^* + \varepsilon_A$ , with $\ell_0$ -sparsity of $\varphi^*$ and $\varepsilon_A\sim N(0, \sigma_A^2)$ , the following results are obtained:

Consistency: As noise vanishes or sample size of coalition values increases, the isotonic regression step recovers the true monotonic transformation $f^*$ up to negligible error.
Support Recovery: Provided the minimum nonzero $|\varphi_j^*|$ is substantially larger than the noise and the signal-to-noise ratio $\|\varphi^*\|_2/\sigma$ is large, the normalized hard-thresholding step identifies the correct support with high probability.
Feature Dependence and Heavy-Tailed Payoffs: SISR learns a transform $f$ that absorbs the nonlinearities, restoring additivity even for payoff structures arising from feature dependence, heavy tails, or domain-specific loss constructions (e.g., $R^2$ under correlated regressors). The canonical linear-Gaussian assumption underlying standard Shapley can otherwise suffer severe violations leading to distorted attributions.

4. Empirical Performance

Comprehensive experiments on synthetic and real data characterize SISR performance:

Synthetic Recovery: For a battery of toy monotonic transforms (e.g., square root, fifth root, logarithm, exponential, tanh-like, Gaussian CDF-odds), SISR accurately fits $f^*$ (correlation $r>0.99$ ).
Sparse Support Recovery: For synthetic $\varphi^*$ with $s^*=3$ , $p$ up to 25, and noise up to $\sigma_0=0.2$ , both support agreement and affinity $\langle \hat{\varphi}, \varphi^*\rangle$ remain near 1. Recovery degrades gracefully with noise; e.g., for $p=25, \sigma_0=0.1$ , true support is recovered approximately 90% of the time.
$R^2$ Payoff Scenarios: Simulations with $R^2$ payoff under feature correlation/irrelevance demonstrate that raw $R^2$ is not additive; SISR fits a correction, delivering stable attributions, while standard Shapley responses become highly nonrobust.
Real Data Applications:
- Prostate dataset: Shapley values exaggerate the importance of “svi”; SISR zeros it out, aligning with AIC/BIC and expert knowledge.
- Boston housing: With robust payoff $\exp(-\text{MSE})$ , baseline Shapley values flip important variable signs; SISR’s log-like transform yields stable rankings.
- Credit and Pima diabetes: Across diverse payoff schemes, SISR provides consistent sparse attributions (selected via RIC), whereas standard SAGE/Shapley attributions are unstable.

5. Computational and Practical Considerations

Complexity: Weighted PAVA is $O(N)$ per iteration. When utilizing sparse storage for $Z$ (each coalition row $\approx p/2$ nonzeros), $\gamma$ -updates cost $O(N\cdot s)$ per iteration. Convergence is typically reached within tens of alternations.
Sparsity Level Selection: If $s^*$ is unknown, Risk Inflation Criterion (RIC) can be used to choose $s$ . Empirical evidence indicates reliable selection.
Initialization: $\gamma^{(0)}$ as the top- $s$ entries of $Z^T W t^{(0)}$ (normalized) and $t^{(0)}=C\nu$ (with large $C$ for numerical stability) are recommended.
Infinite Weights for Special Coalitions: To ensure $f(0)=0$ and the “efficiency” constraint, theoretical Shapley weights $w_{\text{SH}}(\emptyset)$ and $w_{\text{SH}}([p])$ diverge; in practice, substituting large multiples (e.g., $\times$ 10 or $\times$ 100) of the maximum weight is effective.
Extensibility: Alternating PAVA plus hard-thresholding can extend to other convex loss functions (e.g., GLMs with canonical links) by substituting the quadratic surrogate with a suitable local approximation.

6. Significance and Implications in Explainable AI

SISR advances the Shapley explanation paradigm by jointly resolving two stated limitations: First, it restores additivity in the presence of nonadditive, domain-specific, or statistically problematic payoff functions via a learned monotonic transformation. Second, it provides exact $\ell_0$ -sparse attributions in high-dimensional settings, overcoming the inconsistency and computational burden of post hoc thresholding.

Empirical evidence demonstrates that SISR explanations are both more stable and faithful to domain structure—eliminating attributions to irrelevant features and correcting severe rank/sign distortions observed in conventional Shapley or SAGE values under nonlinear or unstable payoffs.

A plausible implication is that SISR is well-positioned as a general-purpose attribution framework for scenarios involving heavy tails, feature dependence, or overarching nonlinearity in payoff structure, where standard linear Shapley values are provably unreliable (She, 2 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Beyond Additivity: Sparse Isotonic Shapley Regression toward Nonlinear Explainability (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sparse Isotonic Shapley Regression (SISR).

Sparse Isotonic Shapley Regression (SISR)

1. Model Formulation and Objectives

2. Optimization Algorithm

3. Theoretical Guarantees and Analysis

4. Empirical Performance

5. Computational and Practical Considerations

6. Significance and Implications in Explainable AI

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sparse Isotonic Shapley Regression (SISR)

1. Model Formulation and Objectives

2. Optimization Algorithm

3. Theoretical Guarantees and Analysis

4. Empirical Performance

5. Computational and Practical Considerations

6. Significance and Implications in Explainable AI

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research