- The paper demonstrates that traditional SHAP scores often misattribute feature importance by failing to incorporate logic-based sufficiency.
- It introduces a corrected characteristic function that ensures only relevant features receive nonzero scores through rigorous logic-based definitions.
- Empirical results validate nuSHAP’s efficacy and competitive performance across various models and datasets.
Rigorous Approaches to Feature Attribution in Explainable AI
Introduction
The paper "Towards Rigorous Explainability by Feature Attribution" (2604.15898) systematically examines the theoretical and practical shortcomings of the widely adopted SHAP methodology for feature attribution in XAI, especially highlighting its failure modes in assigning relative feature importance. The authors present a rigorous alternative based on logic-based definitions of explanation, providing both a theoretical foundation and an empirical evaluation of corrected SHAP scores. This essay analyzes the technical arguments, methodological innovations, and broader implications for feature attribution and interpretability in ML.
Theoretical Flaws in SHAP Scores
A fundamental claim of the paper is the existence of critical deficiencies in the standard SHAP framework, whose characteristic function is rooted in expected value formulations from game theory. Through exhaustive examples—spanning classification, regression, and continuous models, including those with Lipschitz continuity—the paper demonstrates that SHAP scores can systematically ascribe importance to irrelevant features and zero importance to relevant ones. In specific cases, the ranking of feature importance computed via SHAP is demonstrably misleading for human decision-makers, directly violating necessary compliance with feature relevancy [msh-cacm24, hms-ijar24, lhms-corr24b].
These flaws are not confined to discrete-valued models; the paper constructs explicit regression models with real-valued domains and continuous (even differentiable and Lipschitz-continuous) mappings, showing analogous failures. Therefore, the inadequacy is structurally embedded in the original characteristic function used for SHAP in XAI, which is not inherently value-independent and ignores logic-based sufficiency requirements.
Logic-Based Explanability and Corrected SHAP Scores
The authors propose an alternative XAI game, wherein the characteristic function is reformulated to align with logic-based explainability, specifically the notion of abductive explanations (AXps). Instead of expected prediction values, the logic-based function assigns a value (typically {0,1}) to a coalition if fixing those features suffices to guarantee the prediction for the target instance. This definition inherently enforces compliance with feature relevancy, value independence, and numerical neutrality.
Formally, the corrected characteristic function is given by:
a(S;E)={1​if fixing S results in a weak abductive explanation (WAXp) 0​otherwise​
By adopting this logic-based framework, the corrected SHAP scores are guaranteed to respect key requirements that prior formulations violate, such as the condition that only relevant features receive nonzero importance.
Computation of Corrected SHAP Scores
Computationally, the challenge of evaluating the WAXp predicate in the characteristic function is nontrivial for general ML models. For tractable classifier families (decision trees, monotonic classifiers, certain graphical models), polynomial-time algorithms exist for determining AXps, and thus for evaluating the characteristic function [ms-rw22, hiims-kr21, msgcin-icml21]. For black-box or complex models, rigorous model-agnostic approaches are employed, restricting quantification to observed samples and allowing efficient computation [cooper-ecai23, msllm-ijcai25].
The nuSHAP implementation builds on the CGT sampling algorithm [tejada-cor09] to approximate corrected SHAP scores efficiently, with guarantees for convergence and sample complexity. The distinction between model-aware and model-agnostic corrected SHAP scores is operationalized via different methods for WAXp evaluation, depending on access to the internal structure of the model.
Experimental Evaluation
A comparative study evaluates nuSHAP (corrected characteristic function) against SHAP across multiple classifiers (LR, DT, kNN, BT, CNN) and datasets (UCI, MNIST, PMLB). The results show negligible correlation between feature importance rankings given by nuSHAP and SHAP. In particular, mean RBO values between the rankings are consistently low, often near zero for MNIST and other datasets, confirming that SHAP rankings do not coincide with the theoretically justified logic-based importance scores.
Runtime comparisons highlight that nuSHAP is computationally competitive, with average evaluation times comparable to SHAP across datasets; this dispels concerns that rigorous approaches are prohibitively slow.
Implications and Future Directions
The paper provides conclusive evidence that the SHAP methodology, as implemented and conceptualized in XAI, is not theoretically sound as a general-purpose feature attribution framework. The authors assert that the results, conclusions, and scientific discoveries predicated on SHAP-based attribution should be critically re-examined. The corrected approach directly ties game theoretic attribution to logic-based sufficiency, thus addressing compliance failures and aligning attribution with formal interpretability.
Practically, the nuSHAP software and methodology provide model-agnostic and model-aware alternatives usable for rigorous feature ranking in high-stakes applications. Theoretically, the work bridges cooperative game theory and formal methods, suggesting avenues for leveraging classical logic, abduction, and contrastive explanations within XAI.
Open questions remain regarding the development of exact algorithms for corrected SHAP score computation in general models, potentially involving compilation techniques or symbolic encodings as studied in recent literature [steffen-tmf25, barcelo-jmlr23].
Conclusion
"Towards Rigorous Explainability by Feature Attribution" (2604.15898) delivers a comprehensive refutation of classical SHAP scores for feature attribution in XAI, substantiates the superiority of a logic-based approach, and provides an operationally feasible alternative via nuSHAP. The paper’s strong numerical evidence and formal claims challenge the XAI community to reconsider reliance on SHAP and to adopt rigorously justified frameworks for feature importance. The implications for AI adoption in safety-critical domains and fundamental interpretability research are significant, and future developments are likely to increasingly integrate formal, symbolic, and sample-based methods for trustworthy explanation.