Value-Based Explainability in AI

Updated 21 December 2025

Value-based explainability is a paradigm that assigns clear, axiomatic contributions to model components based on predefined metrics such as fairness, accuracy, and economic utility.
It leverages game-theoretic Shapley value concepts with efficient approximations like DerSHAP and Taylor surrogates to generate actionable attributions across diverse models.
This approach extends to structured domains including vision, graphs, and language models, enabling fairness audits and practical insights for model and business optimization.

Value-based explainability is a principled paradigm in interpretable machine learning and AI, in which explanations for model behavior are given with respect to a predefined "value function" that quantifies an aspect of interest such as predictive accuracy, fairness, or economic utility. Canonically exemplified by Shapley-value decompositions from cooperative game theory, value-based explainability formally attributes the total value (e.g., model output, group unfairness, or inference confidence) across input features, coalitions, or graph substructures, so that each component's explanatory share is precisely defined by its marginal contribution under a set of rigorous axioms (efficiency, symmetry, dummy, additivity). The framework has extended from classical tabular feature attribution to deep image models, probabilistic logic, graph neural networks (GNNs), LLMs, and fairness audits, with specialized adaptations to address issues of tractability, data manifold consistency, stochastic inference, and domain-specific KPIs.

1. Shapley Value Formalism and Axioms

At the core of value-based explainability is the Shapley value, originally formulated for fair allocation problems in cooperative games. Let $v:2^N\rightarrow \mathbb{R}$ assign a real-valued payoff to each subset $S$ of features (or "players") from a ground set $N$ . For feature $i\in N$ , the Shapley value is

$\phi_i(v) = \sum_{S\subseteq N \setminus \{i\}} \frac{|S|!\,(n-|S|-1)!}{n!} [v(S\cup\{i\})-v(S)]$

This allocation satisfies:

Efficiency: $\sum_{i=1}^n \phi_i(v) = v(N) - v(\emptyset)$
Dummy: Features never affecting $v$ get zero attribution.
Symmetry: Identically contributing features receive equal shares.
Additivity: Attributions respect linear combinations of games.

In machine learning, $v(S)$ is typically an expectation over perturbed model predictions, holding only features in $S$ fixed, with others marginalized or conditionally sampled, depending on the explainability protocol (Duan et al., 2023, Naudot et al., 3 Nov 2025, Frye et al., 2020).

2. Computational Strategies: From Classic Shapley to Derivative and Taylor Approaches

Computing Shapley values exactly is infeasible for moderate feature sets ( $\mathcal{O}(2^n)$ model queries). Approximations and specialized value-proxies have emerged:

Black-box Sampling Methods (SHAP, KernelSHAP): Estimate $\phi_i$ via Monte Carlo sampling of coalitions, possibly fitting linear surrogates for tractability. KernelSHAP robustly approximates the value function but remains exponential in worst-case complexity (Duan et al., 2023).
Derivative-based Shapley (DerSHAP): Computes

$\phi_i = E[\partial_{x_i} f(x)^2] + \frac{1}{2} \sum_{j \ne i} | E[\partial_{x_i} f(x) \partial_{x_j} f(x)] |$

using only first-order derivatives and their empirical covariances, yielding linear complexity in both dimension and sample size. This enables Shapley-consistent global sensitivity attribution and interaction-aware explainability for differentiable models (Duan et al., 2023).

Second-order Taylor Surrogates for CAMs (ShapleyCAM): Model the cooperative game over spatial activations by expanding $U(X_S)$ near $X_D$ ; the closed-form per-pixel Shapley value incorporates both gradients and Hessian-vector products, bridging heuristic-CAMs and value-based attribution at negligible computational overhead compared to standard backpropagation (Cai, 9 Jan 2025).
Logic-based Value Attribution: In symbolic domains, value-based explanations are obtained by maximizing the inferred probability of the target outcome over $k$ -feature coalitions using probabilistic-logic and linear programming, bypassing Shapley additivity but identifying the most decisive value-based subsets. Alignment with SHAP is empirically high on both real and synthetic tasks (Fan et al., 2020).

3. Extensions to Structured Domains: Graphs, Vision, and LLMs

Graph Neural Networks

Value-based explainability in GNNs treats edges as players in a game where utility is the node-level prediction. Aggregated Shapley attributions, computed via linear surrogates regressed from random masking experiments, yield signed edge scores for graph sparsification:

Edges with strongly positive attributions are retained; negative edges are pruned.
This approach preserves predictive accuracy and produces sparser, more efficient inference graphs than gradient- or perturbation-based explainers, especially at high sparsity regimes (Akkas et al., 28 Jul 2025).

Deep Vision Models

Content Reserved Game-theoretic (CRG) explainers and ShapleyCAM extend Shapley value logic to pixel- or activation-level attribution:

Players are spatial locations; utility is the class-specific output for masked activations.
Second-order Taylor expansions yield closed-form attributions per location.
ShapleyCAM consistently outperforms or matches leading CAM variants on insertion/deletion and localization metrics across a wide range of CNN architectures (Cai, 9 Jan 2025).

LLMs

When applied to LLM outputs, value-based explainability (llmSHAP) must address stochastic inference:

Only cache-based schemes guarantee full efficiency, symmetry, and dummy axioms.
In uncached (stochastic) settings, efficiency is violated due to variance in coalition payoffs.
Windowed or counterfactual variants trade-off faithfulness against computational cost and axiom satisfaction.
In disease-symptom reasoning, cache-based Shapley variants maintain high fidelity with the gold-standard but require exponential compute for large feature sets (Naudot et al., 3 Nov 2025).

4. On-Manifold and Fairness-Aware Value-based Explanations

On-Manifold Shapley

Classic Shapley explainers often use feature-independent imputations, which can violate the data manifold and produce misleading or unintelligible attributions:

On-manifold approaches employ generative models (VAE-based) or directly learned surrogates to estimate conditionals $p(x_{\bar S} | x_S)$ , ensuring only in-distribution samples.
On synthetic, tabular, and vision tasks, on-manifold value-based explanations outperform standard off-manifold Shapley in accuracy, stability, fairness, and alignment with ground-truth attributions (Frye et al., 2020).

Fairness Attribution

Shapley decompositions can attribute a model's group-level unfairness (e.g., demographic parity gap) to input features:

The "unfairness value function" computes feature-level contributions to the disparity.
Additivity allows dissection of fairness gains/losses under additive or perturbative fairness interventions.
This framework quantifies the accuracy–fairness trade-off directly at the feature level and resists manipulation via proxy features—contrasting with standard importance methods that can "hide" unfairness (Begley et al., 2020).

5. Second-order and Dataset-level Value-based Explainability

Standard value-based explainers focus on instance-level or local attributions. Second-order explainable AI (SOXAI) extends this paradigm to the dataset level:

Per-instance explanations (e.g., heatmaps) are projected into a latent space; clusters (concepts) are identified, and their prevalence and task relevance are quantified.
Spurious or overrepresented concepts (e.g., background artifacts, annotation biases) are flagged for dataset or model refinement.
Direct interventions (removal or down-weighting of samples, input masking) based on SOXAI increase both accuracy and fairness, as empirically demonstrated in both classification and segmentation tasks (Zeng et al., 2023).

6. Practical Applications and Business Integration

Value-based explainability is not purely technical—it is designed to deliver measurable benefits in operational, regulatory, and business contexts:

Stakeholder trust, regulatory compliance, and operational efficiency are increased by explanations that tie directly to business KPIs and human-meaningful levers.
Quantitative metrics (fidelity, complexity, stability, counterfactual distance) assess explanation utility.
Structured recipes integrate value-based explanation into the full ML lifecycle, from data curation through in-processing model selection, regularization, and post-hoc audit, to real-time deployment with explainability infrastructure (e.g., SHAP, LIME, EXPO).
Use cases span credit approvals, insurance, marketing, pricing, and healthcare risk, demonstrating concrete improvements in adoption, ROI, audit speed, model trust, and error reduction (Chomiak et al., 2021).

7. Limitations, Trade-offs, and Open Challenges

Value-based explainability methods confront computational and theoretical constraints:

Exact Shapley value computation remains infeasible for large feature sets; surrogate and sampling methods must balance trade-offs between fidelity and scalability.
Off-manifold imputations can induce bias or mask true dependencies—on-manifold construction is essential but poses modeling and estimation challenges in complex domains.
In structured domains (images, graphs, sequences), the design of utility functions and coalition spaces raises domain-specific issues regarding interpretability and explanation granularity.
Stochasticity in model inference (e.g., LLMs) may break Shapley efficiency or symmetry, requiring careful design of caching and evaluation protocols.

Future research is directed toward more scalable, high-dimensional value-based explainers, the integration of causal inference, development of human-centered evaluation frameworks, and tighter coupling of explainability with fairness, privacy, and accountability.

References: