Explainability as statistical inference (2212.03131v3)
Abstract: A wide variety of model explanation approaches have been proposed in recent years, all guided by very different rationales and heuristics. In this paper, we take a new route and cast interpretability as a statistical inference problem. We propose a general deep probabilistic model designed to produce interpretable predictions. The model parameters can be learned via maximum likelihood, and the method can be adapted to any predictor network architecture and any type of prediction problem. Our method is a case of amortized interpretability models, where a neural network is used as a selector to allow for fast interpretation at inference time. Several popular interpretability methods are shown to be particular cases of regularised maximum likelihood for our general model. We propose new datasets with ground truth selection which allow for the evaluation of the features importance map. Using these datasets, we show experimentally that using multiple imputation provides more reasonable interpretations.
- Sanity checks for saliency maps. Advances in Neural Information Processing Systems, 31, 2018.
- Towards rigorous interpretations: a formalisation of feature attribution. In Proceedings of the 38th International Conference on Machine Learning, pp. 76–86, 2021.
- Towards robust interpretability with self-explaining neural networks. Advances in Neural Information Processing Systems, 31, 2018.
- Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
- Importance weighted autoencoders. In International Conference on Learning Representations, 2016.
- Learning to explain: An information-theoretic perspective on model interpretation. In Proceedings of the 35th International Conference on Machine Learning, pp. 883–892, 2018.
- Probabilistic circuits: A unifying framework for tractable probabilistic models. Unpublished manuscript, URL: http://starai.cs.ucla.edu/papers/ProbCirc20.pdf, 2020.
- Explaining by removing: A unified framework for model explanation. The Journal of Machine Learning Research, 22(1):9477–9566, 2021.
- Label-free explainability for unsupervised models. In Proceedings of the 39th International Conference on Machine Learning, volume 162, pp. 4391–4420, 2022.
- Importance weighting and variational inference. Advances in Neural Information Processing Systems, 31, 2018.
- A benchmark for interpretability methods in deep neural networks. Advances in Neural Information Processing Systems, 32, 2019.
- How to deal with missing data in supervised deep learning? In International Conference on Learning Representations, 2022.
- Variational autoencoder with arbitrary conditioning. In International Conference on Learning Representations, 2019.
- Have we learned to explain?: How interpretability methods can learn to encode predictions in their interpretations. In International Conference on Artificial Intelligence and Statistics, pp. 1459–1467, 2021a.
- FastSHAP: Real-time Shapley value estimation. In International Conference on Learning Representations, 2021b.
- The (un) reliability of saliency methods. Explainable AI: Interpreting, explaining and visualizing deep learning, pp. 267–280, 2019.
- What’s a good imputation to predict with missing values? Advances in Neural Information Processing Systems, 34, 2021.
- Rationalizing neural predictions. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 107–117. Association for Computational Linguistics, 2016.
- Lassonet: A neural network with feature sparsity. The Journal of Machine Learning Research, 22(1):5633–5661, 2021.
- Explainable AI: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021.
- Synthetic benchmarks for scientific research in explainable machine learning. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021.
- Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), 2015.
- A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, 2017.
- The concrete distribution: A continuous relaxation of discrete random variables. In International Conference on Learning Representations, 2017.
- Monte Carlo gradient estimation in machine learning. The Journal of Machine Learning Research, 21(132):1–62, 2020.
- Explainable k-means and k-medians clustering. In Proceedings of the 37th International Conference on Machine Learning, volume 119, pp. 7055–7065, 2020.
- A comparative study of methods for estimating conditional Shapley values and when to use them. arXiv preprint arXiv:2305.09536, 2023.
- Class-specific variable selection in high-dimensional discriminant analysis through Bayesian sparsity. Journal of Chemometrics, 33(2):e3097, 2019.
- Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12:2825–2830, 2011.
- “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144, 2016.
- A consistent and efficient evaluation strategy for attribution methods. In Proceedings of the 39th International Conference on Machine Learning, pp. 18770–18795, 2022.
- U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Springer, 2015.
- Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019.
- PixelCNN++: Improving the pixelCNN with discretized logistic mixture likelihood and other modifications. In International Conference on Learning Representations, 2017.
- Noise-adding methods of saliency map as series of higher order partial derivative. arXiv preprint arXiv:1806.03000, 2018.
- Shapley, L. S. A value for n𝑛nitalic_n-person games. Contributions to the Theory of Games, number 28 in Annals of Mathematics Studies, pages 307–317, II, 1953.
- Fooling LIME and SHAP: Adversarial attacks on post hoc explanation methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, AIES ’20, pp. 180–186. Association for Computing Machinery, 2020.
- Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems, 12, 1999.
- Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996.
- Doubly reparameterized gradient estimators for Monte Carlo objectives. In International Conference on Learning Representations, 2019.
- Probabilistic sufficient explanations. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pp. 3082–3088, 2021.
- The explanation game: a formal framework for interpretable machine learning. In Ethics, Governance, and Policies in Artificial Intelligence, pp. 185–219. Springer, 2021.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
- Reparameterizable subset sampling via continuous relaxations. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI’19, pp. 3919–3925. AAAI Press, 2019.
- INVASE: Instance-wise variable selection using neural networks. In International Conference on Learning Representations, 2018.
- Concept embedding models: Beyond the accuracy-explainability trade-off. Advances in Neural Information Processing Systems, 35, 2022.
- mixup: Beyond empirical risk minimization. In International Conference on Learning Representations, 2018.
- Deep learning for the partially linear cox model. The Annals of Statistics, 50(3):1348–1375, 2022.