Value-Based Explainability in AI
- Value-based explainability is a paradigm that assigns clear, axiomatic contributions to model components based on predefined metrics such as fairness, accuracy, and economic utility.
- It leverages game-theoretic Shapley value concepts with efficient approximations like DerSHAP and Taylor surrogates to generate actionable attributions across diverse models.
- This approach extends to structured domains including vision, graphs, and language models, enabling fairness audits and practical insights for model and business optimization.
Value-based explainability is a principled paradigm in interpretable machine learning and AI, in which explanations for model behavior are given with respect to a predefined "value function" that quantifies an aspect of interest such as predictive accuracy, fairness, or economic utility. Canonically exemplified by Shapley-value decompositions from cooperative game theory, value-based explainability formally attributes the total value (e.g., model output, group unfairness, or inference confidence) across input features, coalitions, or graph substructures, so that each component's explanatory share is precisely defined by its marginal contribution under a set of rigorous axioms (efficiency, symmetry, dummy, additivity). The framework has extended from classical tabular feature attribution to deep image models, probabilistic logic, graph neural networks (GNNs), LLMs, and fairness audits, with specialized adaptations to address issues of tractability, data manifold consistency, stochastic inference, and domain-specific KPIs.
1. Shapley Value Formalism and Axioms
At the core of value-based explainability is the Shapley value, originally formulated for fair allocation problems in cooperative games. Let assign a real-valued payoff to each subset of features (or "players") from a ground set . For feature , the Shapley value is
This allocation satisfies:
- Efficiency:
- Dummy: Features never affecting get zero attribution.
- Symmetry: Identically contributing features receive equal shares.
- Additivity: Attributions respect linear combinations of games.
In machine learning, is typically an expectation over perturbed model predictions, holding only features in fixed, with others marginalized or conditionally sampled, depending on the explainability protocol (Duan et al., 2023, Naudot et al., 3 Nov 2025, Frye et al., 2020).
2. Computational Strategies: From Classic Shapley to Derivative and Taylor Approaches
Computing Shapley values exactly is infeasible for moderate feature sets ( model queries). Approximations and specialized value-proxies have emerged:
- Black-box Sampling Methods (SHAP, KernelSHAP): Estimate via Monte Carlo sampling of coalitions, possibly fitting linear surrogates for tractability. KernelSHAP robustly approximates the value function but remains exponential in worst-case complexity (Duan et al., 2023).
- Derivative-based Shapley (DerSHAP): Computes
using only first-order derivatives and their empirical covariances, yielding linear complexity in both dimension and sample size. This enables Shapley-consistent global sensitivity attribution and interaction-aware explainability for differentiable models (Duan et al., 2023).
- Second-order Taylor Surrogates for CAMs (ShapleyCAM): Model the cooperative game over spatial activations by expanding near ; the closed-form per-pixel Shapley value incorporates both gradients and Hessian-vector products, bridging heuristic-CAMs and value-based attribution at negligible computational overhead compared to standard backpropagation (Cai, 9 Jan 2025).
- Logic-based Value Attribution: In symbolic domains, value-based explanations are obtained by maximizing the inferred probability of the target outcome over -feature coalitions using probabilistic-logic and linear programming, bypassing Shapley additivity but identifying the most decisive value-based subsets. Alignment with SHAP is empirically high on both real and synthetic tasks (Fan et al., 2020).
3. Extensions to Structured Domains: Graphs, Vision, and LLMs
Graph Neural Networks
Value-based explainability in GNNs treats edges as players in a game where utility is the node-level prediction. Aggregated Shapley attributions, computed via linear surrogates regressed from random masking experiments, yield signed edge scores for graph sparsification:
- Edges with strongly positive attributions are retained; negative edges are pruned.
- This approach preserves predictive accuracy and produces sparser, more efficient inference graphs than gradient- or perturbation-based explainers, especially at high sparsity regimes (Akkas et al., 28 Jul 2025).
Deep Vision Models
Content Reserved Game-theoretic (CRG) explainers and ShapleyCAM extend Shapley value logic to pixel- or activation-level attribution:
- Players are spatial locations; utility is the class-specific output for masked activations.
- Second-order Taylor expansions yield closed-form attributions per location.
- ShapleyCAM consistently outperforms or matches leading CAM variants on insertion/deletion and localization metrics across a wide range of CNN architectures (Cai, 9 Jan 2025).
LLMs
When applied to LLM outputs, value-based explainability (llmSHAP) must address stochastic inference:
- Only cache-based schemes guarantee full efficiency, symmetry, and dummy axioms.
- In uncached (stochastic) settings, efficiency is violated due to variance in coalition payoffs.
- Windowed or counterfactual variants trade-off faithfulness against computational cost and axiom satisfaction.
- In disease-symptom reasoning, cache-based Shapley variants maintain high fidelity with the gold-standard but require exponential compute for large feature sets (Naudot et al., 3 Nov 2025).
4. On-Manifold and Fairness-Aware Value-based Explanations
On-Manifold Shapley
Classic Shapley explainers often use feature-independent imputations, which can violate the data manifold and produce misleading or unintelligible attributions:
- On-manifold approaches employ generative models (VAE-based) or directly learned surrogates to estimate conditionals , ensuring only in-distribution samples.
- On synthetic, tabular, and vision tasks, on-manifold value-based explanations outperform standard off-manifold Shapley in accuracy, stability, fairness, and alignment with ground-truth attributions (Frye et al., 2020).
Fairness Attribution
Shapley decompositions can attribute a model's group-level unfairness (e.g., demographic parity gap) to input features:
- The "unfairness value function" computes feature-level contributions to the disparity.
- Additivity allows dissection of fairness gains/losses under additive or perturbative fairness interventions.
- This framework quantifies the accuracy–fairness trade-off directly at the feature level and resists manipulation via proxy features—contrasting with standard importance methods that can "hide" unfairness (Begley et al., 2020).
5. Second-order and Dataset-level Value-based Explainability
Standard value-based explainers focus on instance-level or local attributions. Second-order explainable AI (SOXAI) extends this paradigm to the dataset level:
- Per-instance explanations (e.g., heatmaps) are projected into a latent space; clusters (concepts) are identified, and their prevalence and task relevance are quantified.
- Spurious or overrepresented concepts (e.g., background artifacts, annotation biases) are flagged for dataset or model refinement.
- Direct interventions (removal or down-weighting of samples, input masking) based on SOXAI increase both accuracy and fairness, as empirically demonstrated in both classification and segmentation tasks (Zeng et al., 2023).
6. Practical Applications and Business Integration
Value-based explainability is not purely technical—it is designed to deliver measurable benefits in operational, regulatory, and business contexts:
- Stakeholder trust, regulatory compliance, and operational efficiency are increased by explanations that tie directly to business KPIs and human-meaningful levers.
- Quantitative metrics (fidelity, complexity, stability, counterfactual distance) assess explanation utility.
- Structured recipes integrate value-based explanation into the full ML lifecycle, from data curation through in-processing model selection, regularization, and post-hoc audit, to real-time deployment with explainability infrastructure (e.g., SHAP, LIME, EXPO).
- Use cases span credit approvals, insurance, marketing, pricing, and healthcare risk, demonstrating concrete improvements in adoption, ROI, audit speed, model trust, and error reduction (Chomiak et al., 2021).
7. Limitations, Trade-offs, and Open Challenges
Value-based explainability methods confront computational and theoretical constraints:
- Exact Shapley value computation remains infeasible for large feature sets; surrogate and sampling methods must balance trade-offs between fidelity and scalability.
- Off-manifold imputations can induce bias or mask true dependencies—on-manifold construction is essential but poses modeling and estimation challenges in complex domains.
- In structured domains (images, graphs, sequences), the design of utility functions and coalition spaces raises domain-specific issues regarding interpretability and explanation granularity.
- Stochasticity in model inference (e.g., LLMs) may break Shapley efficiency or symmetry, requiring careful design of caching and evaluation protocols.
Future research is directed toward more scalable, high-dimensional value-based explainers, the integration of causal inference, development of human-centered evaluation frameworks, and tighter coupling of explainability with fairness, privacy, and accountability.
References:
- Derivative-based Shapley: (Duan et al., 2023)
- LLMs: (Naudot et al., 3 Nov 2025)
- Probabilistic Logic XAI: (Fan et al., 2020)
- On-Manifold Shapley: (Frye et al., 2020)
- CAMs and ShapleyCAM: (Cai, 9 Jan 2025)
- GNN Shapley Sparsification: (Akkas et al., 28 Jul 2025)
- SOXAI and Dataset-level XAI: (Zeng et al., 2023)
- Fairness via Shapley: (Begley et al., 2020)
- XAI in business: (Chomiak et al., 2021)