Shapley Explanation Networks (2104.02297v1)

Published 6 Apr 2021 in cs.LG and cs.AI

Abstract: Shapley values have become one of the most popular feature attribution explanation methods. However, most prior work has focused on post-hoc Shapley explanations, which can be computationally demanding due to its exponential time complexity and preclude model regularization based on Shapley explanations during training. Thus, we propose to incorporate Shapley values themselves as latent representations in deep models thereby making Shapley explanations first-class citizens in the modeling paradigm. This intrinsic explanation approach enables layer-wise explanations, explanation regularization of the model during training, and fast explanation computation at test time. We define the Shapley transform that transforms the input into a Shapley representation given a specific function. We operationalize the Shapley transform as a neural network module and construct both shallow and deep networks, called ShapNets, by composing Shapley modules. We prove that our Shallow ShapNets compute the exact Shapley values and our Deep ShapNets maintain the missingness and accuracy properties of Shapley values. We demonstrate on synthetic and real-world datasets that our ShapNets enable layer-wise Shapley explanations, novel Shapley regularizations during training, and fast computation while maintaining reasonable performance. Code is available at https://github.com/inouye-lab/ShapleyExplanationNetworks.

PDF Abstract

Shapley Explanation Networks: Integrating Shapley Values into Deep Learning Models

The research presented in this paper addresses the integration of Shapley values into the deep learning framework to enhance feature attribution methods in machine learning. Traditionally, Shapley values have been used as a post-hoc technique for model explanations, requiring extensive computational resources due to their exponential complexity. This paper introduces a novel approach by embedding Shapley values directly into the modeling process, enabling intrinsic explanations that are efficient and potentially improve model training through explanation regularization.

Overview of Contributions

The paper makes several key contributions to the field of interpretable machine learning:

Shapley Transform: This research defines the Shapley transform, a method to convert inputs into Shapley representations by integrating a function directly into a neural network. The transform allows for layer-wise explanations, maintaining the core properties of Shapley values while providing fast computation.
Network Architecture: The authors introduce two types of networks: Shallow Shapley Explanation Networks and Deep Shapley Explanation Networks (Deep s). The architecture inherently generates layer-wise explanations, leveraging Shapley modules constructed within the network designed to compute exact Shapley values for shallow architectures and maintain key properties for deep architectures.
Explanation Regularization: The integration of Shapley values as latent representations allows for explanation regularization during the training phase. This can encourage regularized models that lead to sparsity or smoothness in feature importance, aligning model explanations with domain-specific insights.
Dynamic Computation: To enhance computational efficacy, Deep s utilize instance-specific dynamic pruning, which can dynamically skip computations during inference while maintaining model performance, thereby reducing unnecessary overhead.

Empirical Evaluation

The paper conducts thorough experimentation to validate the performance and explainability of the proposed networks:

Model Performance: Comparison across synthetic and real-world datasets shows that Deep and Shallow Shapley networks deliver competitive performance compared to traditional deep neural networks while offering intrinsic model explanations.
Explanation Quality: The networks demonstrate capabilities to approximate true Shapley values closely, as seen in low-dimensional datasets, and provide superior explanation quality benchmarks like digit flipping experiments for high-dimensional image data.
Layer-Wise Insights: Layer-wise explanations provide a finer granularity into how different layers contribute to predictions, which in turn supports dynamic pruning during inference to optimize computational efficiency without sacrificing accuracy.

Implications and Future Directions

This paper contributes significantly to narrowing the gap between interpretation and model performance by integrating Shapley values within the learning process. The implications of this work are manifold:

Enhanced Interpretability: The approach sidesteps the computationally intensive post-hoc analysis typically required when using Shapley values, making Shapley-based explanations accessible for larger models and datasets.
Practical Applications: Layer-wise interpretability can inform real-world applications, such as healthcare or finance, where understanding the feature significance in model predictions is crucial.
Theoretical Advancements: The development of Shapley transform and network architectures demonstrates a shift toward making explanations an integral part of model training and deployment, offering new directions for theoretically grounded explanation models.

Future research could explore the scalability of these networks in other domains, the integrative combination with other interpretability methods, and the impact on adversarial robustness. Additionally, further exploration in explanation regularization could refine the alignment between model explanations and domain expert knowledge.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Rui Wang (996 papers)
Xiaoqian Wang (34 papers)
David I. Inouye (25 papers)

Citations (38)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - inouye-lab/ShapleyExplanationNetworks: Implementation of the paper "Shapley Explanation Networks" (82 stars)