Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

KernelSHAP-IQ: Weighted Least-Square Optimization for Shapley Interactions (2405.10852v2)

Published 17 May 2024 in cs.LG and cs.AI

Abstract: The Shapley value (SV) is a prevalent approach of allocating credit to ML entities to understand black box ML models. Enriching such interpretations with higher-order interactions is inevitable for complex systems, where the Shapley Interaction Index (SII) is a direct axiomatic extension of the SV. While it is well-known that the SV yields an optimal approximation of any game via a weighted least square (WLS) objective, an extension of this result to SII has been a long-standing open problem, which even led to the proposal of an alternative index. In this work, we characterize higher-order SII as a solution to a WLS problem, which constructs an optimal approximation via SII and $k$-Shapley values ($k$-SII). We prove this representation for the SV and pairwise SII and give empirically validated conjectures for higher orders. As a result, we propose KernelSHAP-IQ, a direct extension of KernelSHAP for SII, and demonstrate state-of-the-art performance for feature interactions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Slic superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11):2274–2282, 2012. doi: 10.1109/TPAMI.2012.120.
  2. Banzhaf III, J. F. Weighted voting doesn’t work: A mathematical analysis. Rutgers Law Review, 19:317, 1964.
  3. From Shapley Values to Generalized Additive Models and back. In International Conference on Artificial Intelligence and Statistics (AISTATS 2023), volume 206 of Proceedings of Machine Learning Research, pp.  709–745. PMLR, 2023.
  4. Polynomial calculation of the Shapley value based on sampling. Computers & Operations Research, 36(5):1726–1730, 2009. doi: 10.1016/j.cor.2008.04.004.
  5. Extremal Principle Solutions of Games in Characteristic Function Form: Core, Chebychev and Shapley Value Generalizations, volume 11, pp.  123–133. Springer Netherlands, 1988. doi: 10.1007/978-94-009-3677-5˙7.
  6. Algorithms to estimate Shapley value feature attributions. Nature Machine Intelligence, 5:590–601, 2023. doi: 10.1038/s42256-023-00657-x.
  7. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD 2016), pp.  785–794. ACM, 2016. doi: 10.1145/2939672.2939785.
  8. Improving KernelSHAP: Practical Shapley Value Estimation Using Linear Regression. In The 24th International Conference on Artificial Intelligence and Statistics, (AISTATS 2021), volume 130 of Proceedings of Machine Learning Research, pp.  3457–3465. PMLR, 2021.
  9. Explaining by Removing: A Unified Framework for Model Explanation. Journal of Machine Learning Research, 22(209):1–90, 2021. doi: 10.5555/3546258.3546467.
  10. Unifying Fourteen Post-hoc Attribution Methods with Taylor Interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp.  1–17, 2024. doi: 10.1109/TPAMI.2024.3358410.
  11. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (CVPR 2009), pp.  248–255. IEEE Computer Society, 2009. doi: 10.1109/CVPR.2009.5206848.
  12. On the complexity of cooperative solution concepts. Mathematics of Operations Research, 19(2):257–266, 1994. doi: 10.1287/moor.19.2.257.
  13. Interaction transform of set functions over a finite set. Information Sciences, 121(1–2):149–170, 1999. doi: 10.1016/S0020-0255(99)00099-7.
  14. Event Labeling Combining Ensemble Detectors and Background Knowledge. Progress in Artificial Intelligence, 2(2):113–127, 2014. doi: 10.1007/s13748-013-0040-3.
  15. OpenML-Python: an extensible Python API for OpenML. CoRR, abs/1911.02490, 2020.
  16. Axiomatic characterizations of probabilistic and cardinal-probabilistic interaction indices. Games and Economic Behavior, 55(1):72–99, 2006. doi: 10.1016/j.geb.2005.03.002.
  17. SHAP-IQ: Unified approximation of any-order shapley interactions. In Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023), 2023.
  18. Data shapley: Equitable valuation of data for machine learning. In Proceedings of the 36th International Conference on Machine Learning, (ICML 2019), volume 97 of Proceedings of Machine Learning Research, pp.  2242–2251. PMLR, 2019.
  19. Grabisch, M. k-order additive discrete fuzzy measures and their representation. Fuzzy Sets and Systems, 92(2):167–189, 1997. doi: 10.1016/S0165-0114(97)00168-1.
  20. Grabisch, M. Set Functions, Games and Capacities in Decision Making, volume 46. Springer International Publishing Switzerland, 2016. ISBN 978-3-319-30690-2. doi: 10.1007/978-3-319-30690-2.
  21. An axiomatic approach to the concept of interaction among players in cooperative games. International Journal of Game Theory, 28(4):547–565, 1999. doi: 10.1007/s001820050125.
  22. Equivalent representations of set functions. Mathematics of Operations Research, 25(2):157–178, 2000. doi: 10.1287/moor.25.2.157.12225.
  23. Approximations of pseudo-boolean functions; applications to game theory. ZOR Mathematical Methods of Operations Research, 36(1):3–21, 1992. doi: 10.1007/BF01541028.
  24. Joint shapley values: a measure of joint feature importance. In The Tenth International Conference on Learning Representations, (ICLR 2022). OpenReview.net, 2022.
  25. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, (CVPR 2016), pp.  770–778. IEEE Computer Society, 2016. doi: 10.1109/CVPR.2016.90.
  26. Decomposing Global Feature Effects Based on Feature Interactions. CoRR, abs/2306.00541, 2023.
  27. Statistical aspects of shap: Functional anova for model interpretation. CoRR, abs/2208.09970, 2022.
  28. Unifying local and global model explanations by functional decomposition of low dimensional structures. In International Conference on Artificial Intelligence and Statistics (AISTATS 2023), volume 206 of Proceedings of Machine Learning Research, pp.  7040–7060. PMLR, 2023.
  29. Hooker, G. Discovering additive structure in black box functions. In Kim, W., Kohavi, R., Gehrke, J., and DuMouchel, W. (eds.), Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD 2004), pp.  575–580. ACM, 2004. doi: 10.1145/1014052.1014122.
  30. Hooker, G. Generalized Functional ANOVA Diagnostics for High-Dimensional Functions of Dependent Variables. Journal of Computational and Graphical Statistics, 16(3):709–732, 2007. doi: 10.1198/106186007X237892.
  31. Explaining explanations: Axiomatic feature interactions for deep networks. Journal of Machine Learning Research, 22(104):1–54, 2021.
  32. FastSHAP: Real-Time Shapley Value Estimation. In The Tenth International Conference on Learning Representations (ICLR 2022). OpenReview.net, 2022.
  33. Sparse spatial autoregressions. Statistics & Probability Letters, 33(3):291–297, 1997. doi: https://doi.org/10.1016/S0167-7152(96)00140-X.
  34. Kohavi, R. Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid. In Proceedings of International Conference on Knowledge Discovery and Data Mining (KDD 1996), pp.  202–207, 1996.
  35. Approximating the Shapley Value without Marginal Contributions. CoRR, abs/2302.00736, 2023.
  36. SVARM-IQ: Efficient Approximation of Any-order Shapley Interactions through Stratification. CoRR, abs/2401.13371, 2024.
  37. Shapley Residuals: Quantifying the limits of the Shapley value for explanations. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021 NeurIPS 2021, pp.  26598–26608, 2021.
  38. Problems with Shapley-value-based explanations as feature importance measures. In Proceedings of the 37th International Conference on Machine Learning (ICML 2020), volume 119 of Proceedings of Machine Learning Research, pp.  5491–5500. PMLR, 2020.
  39. Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models. In The 23rd International Conference on Artificial Intelligence and Statistics (AISTATS 2020), volume 108 of Proceedings of Machine Learning Research, pp.  2402–2412. PMLR, 2020.
  40. Datasets: A community library for natural language processing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, (EMNLP 2021), pp.  175–184. Association for Computational Linguistics, 2021.
  41. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, NeurIPS 2017, pp.  4765–4774, 2017.
  42. From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, 2(1):56–67, 2020. doi: 10.1038/s42256-019-0138-9.
  43. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, (HLT 2011), pp.  142–150. Association for Computational Linguistics, 2011.
  44. The Chaining Interaction Index among Players in Cooperative Games, pp.  69–85. Springer Netherlands, 1999. doi: 10.1007/978-94-017-0647-6˙5.
  45. Beyond word importance: Contextual decomposition to extract interactions from lstms. In 6th International Conference on Learning Representations, (ICLR 2018), 2018.
  46. Beyond TreeSHAP: Efficient Computation of Any-Order Shapley Interactions for Tree Ensembles. CoRR, abs/2401.12069, 2024.
  47. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, (NeurIPS 2019), pp.  8024–8035. Curran Associates, Inc., 2019.
  48. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12:2825–2830, 2011. doi: 10.5555/1953048.2078195.
  49. A k-additive choquet integral-based approach to approximate the SHAP values for local interpretability in machine learning. Artificial Intelligence, 325:104014, 2023. doi: 10.1016/J.ARTINT.2023.104014.
  50. The family of least square values for transferable utility games. Games and Economic Behavior, 24(1):109–130, 1998. doi: https://doi.org/10.1006/game.1997.0622.
  51. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. CoRR, abs/1910.01108, 2019.
  52. Shapley, L. S. A Value for n-Person Games. In Contributions to the Theory of Games (AM-28), Volume II, pp.  307–318. Princeton University Press, 1953.
  53. Fooling LIME and SHAP: adversarial attacks on post hoc explanation methods. In AAAI/ACM Conference on AI, Ethics, and Society (AIES 2020), pp.  180–186. ACM, 2020. doi: 10.1145/3375627.3375830.
  54. The Shapley Taylor Interaction Index. In Proceedings of the 37th International Conference on Machine Learning, (ICML 2020), volume 119 of Proceedings of Machine Learning Research, pp.  9259–9268. PMLR, 2020.
  55. Faith-Shap: The Faithful Shapley Interaction Index. Journal of Machine Learning Research, 24(94):1–42, 2023.
  56. Detecting statistical interactions from neural network weights. In 6th International Conference on Learning Representations, (ICLR 2018), 2018.
  57. Feature interaction interpretability: A case for explaining ad-recommendation systems via neural interaction detection. In 8th International Conference on Learning Representations, (ICLR 2020), 2020a.
  58. How does This Interaction Affect Me? Interpretable Attribution for Feature Interactions. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems (NeurIPS 2020), pp.  6147–6159, 2020b.
  59. SNP interaction detection with Random Forests in high-dimensional genetic data. BMC Bioinformatics, 13:164, 2012. doi: 10.1186/1471-2105-13-164.
  60. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, (EMNLP 2020), pp.  38–45. Association for Computational Linguistics, 2020. doi: 10.18653/v1/2020.emnlp-demos.6.
  61. Do little interactions get lost in dark random forests? BMC Bioinform., 17:145, 2016. doi: 10.1186/s12859-016-0995-8.
  62. Linear tree shap. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, (NeurIPS 2022), 2022.
  63. Interventional SHAP values and interaction values for piecewise linear regression trees. In Thirty-Seventh AAAI Conference on Artificial Intelligence, (AAAI 2023), pp.  11164–11173. AAAI Press, 2023. doi: 10.1609/AAAI.V37I9.26322.
  64. Interpreting Multivariate Shapley Interactions in DNNs. In Thirty-Fifth AAAI Conference on Artificial Intelligence, (AAAI 2021), pp.  10877–10886. AAAI Press, 2021.
Citations (5)

Summary

  • The paper introduces a novel weighted least square optimization formulation to compute Shapley interactions efficiently.
  • It adapts the KernelSHAP method to capture complex feature interactions, validated by experiments in sentiment analysis and regression tasks.
  • This approach provides faster and more accurate model interpretability, promising broader applications in real-world AI systems.

Understanding KernelSHAP-IQ: Weighted Least Square Optimization for Shapley Interactions

Introduction

If you've been working with machine learning models, you've probably heard about the Shapley value (SV) for interpreting model outputs. Whether it's feature attribution, feature importance, or data valuation, SV is a versatile tool for understanding how different entities (like features) contribute to a model's prediction. But what if you need to understand how combinations of features work together? That's where Shapley Interaction Index (SII) steps in.

The paper "KernelSHAP-IQ: Weighted Least Square Optimization for Shapley Interactions" introduces an extension of KernelSHAP, a popular tool for SV, to handle higher-order feature interactions through SII. Let's break down the key concepts and findings from this paper and why it's relevant for data scientists.

Shapley Values and Interactions

The Basics of Shapley Values

Shapley values distribute the payout (like a model's prediction) among different players (features) fairly, based on their contributions. The SV for a feature ii is calculated as the weighted average of its marginal contributions across all possible subsets of features.

Introducing Shapley Interaction Index (SII)

While the SV helps to understand the individual feature contributions, it may fall short for complex real-world problems where interactions between features are crucial. For instance, in sentiment analysis, words like "never" and "forget" might negate each other when they appear together, but this interaction won't be captured if we only consider their individual contributions.

The SII extends SV to account for these interactions. Essentially, the SII allows us to understand how groups of features work together to impact the model's output.

The Core Contributions of KernelSHAP-IQ

Traditional methods for calculating Shapley interactions are computationally prohibitive, requiring exponential time in the number of features. This paper's primary aim is to reduce this complexity by linking SII with a Weighted Least Square (WLS) optimization problem, similar to how SV is computed in KernelSHAP.

Optimal Approximations via Shapley Interactions

The paper shows that we can represent higher-order interactions (SII) as a solution to a WLS problem. Here's a simplified view:

  • For pairwise interactions (order 2), the SII can be constructed iteratively, building on lower-order Shapley values.
  • The proposed method introduces KernelSHAP-IQ, which uses this iterative approach to compute exact or approximate SII values efficiently.

Performance and Validation

KernelSHAP-IQ has been empirically demonstrated to outperform several existing methods in both approximation quality and efficiency across various datasets and machine learning models.

Practical Insights and Future Directions

Empirical Results

KernelSHAP-IQ shows state-of-the-art performance in capturing feature interactions. For example, in a sentiment analysis task, it correctly identified the critical interaction between "never" and "forget," which contributes significantly to the positive sentiment of a movie review.

Use Cases

  1. Sentiment Analysis: As shown in the paper, understanding how word combinations impact sentiment can refine feature attributions and improve interpretability.
  2. Regression Tasks: For predicting housing prices, it can reveal how combinations of geographical features (latitude and longitude) precisely dictate the model's prediction.

Looking Forward

Given its efficiency and effectiveness, KernelSHAP-IQ can be used for a variety of tasks in feature interaction analysis. Future research could further streamline this approach or even extend it to other types of model explanations. Additionally, integrating KernelSHAP-IQ with real-time systems might open up new avenues for dynamic model interpretability.

Conclusion

KernelSHAP-IQ represents a significant step in the field of model interpretability, particularly for capturing feature interactions. By extending the well-known KernelSHAP method to handle higher-order interactions efficiently, this paper provides a powerful tool for data scientists looking to explore how their models make decisions. Whether you're working with text, images, or tabular data, understanding these interactions can lead to more robust and explainable artificial intelligence systems.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets