On Minimizing the Impact of Dataset Shifts on Actionable Explanations (2306.06716v1)
Abstract: The Right to Explanation is an important regulatory principle that allows individuals to request actionable explanations for algorithmic decisions. However, several technical challenges arise when providing such actionable explanations in practice. For instance, models are periodically retrained to handle dataset shifts. This process may invalidate some of the previously prescribed explanations, thus rendering them unactionable. But, it is unclear if and when such invalidations occur, and what factors determine explanation stability i.e., if an explanation remains unchanged amidst model retraining due to dataset shifts. In this paper, we address the aforementioned gaps and provide one of the first theoretical and empirical characterizations of the factors influencing explanation stability. To this end, we conduct rigorous theoretical analysis to demonstrate that model curvature, weight decay parameters while training, and the magnitude of the dataset shift are key factors that determine the extent of explanation (in)stability. Extensive experimentation with real-world datasets not only validates our theoretical results, but also demonstrates that the aforementioned factors dramatically impact the stability of explanations produced by various state-of-the-art methods.
- Fairwashing Explanations with Off-Manifold Detergent. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020.
- E. Black and M. Fredrikson. Leave-One-out Unfairness. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 285–295. Association for Computing Machinery, 2021.
- Selective Ensembles for Consistent Predictions. In International Conference on Learning Representations, 2022.
- O. Bousquet and A. Elisseeff. Stability and Generalization. The Journal of Machine Learning Research, 2:499–526, 2002.
- Implications of Model Indeterminacy for Explanations of Automated Decisions. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022.
- Counterfactual Plans under Distributional Ambiguity. In International Conference on Learning Representations, 2022.
- Underspecification Presents Challenges for Credibility in Modern Machine Learning. Journal of Machine Learning Research, 23(226):1–61, 2022.
- Explanations can be manipulated and geometry is to blame. Advances in Neural Information Processing Systems, 32, 2019.
- On the Adversarial Robustness of Causal Algorithmic Recourse. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 5324–5342. PMLR, 17–23 Jul 2022.
- F. Doshi-Velez and B. Kim. Towards A Rigorous Science of Interpretable Machine Learning. arXiv: Machine Learning, 2017.
- D. Dua and C. Graff. UCI Machine Learning Repository, 2017.
- Robust Counterfactual Explanations for Tree-Based Ensembles. In International Conference on Machine Learning, pages 5742–5756. PMLR, 2022.
- European Commission. General Data Protection Regulation (GDPR).
- How Good Is Your Explanation? Algorithmic Stability Measures To Assess the Quality of Explanations for Deep Neural Networks. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 720–730, January 2022.
- FICO. Explainable Machine Learning Challenge, 2018.
- Robust Counterfactual Explanations for Random Forests. arXiv preprint arXiv:2205.14116, 2022.
- Interpretation of Neural Networks is Fragile. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI’19/IAAI’19/EAAI’19. AAAI Press, 2019.
- Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post Hoc Explanations. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022.
- Train faster, generalize better: Stability of stochastic gradient descent. In International conference on machine learning, pages 1225–1234. PMLR, 2016.
- Fooling Neural Network Interpretations via Adversarial Model Manipulation. Advances in Neural Information Processing Systems, 32, 2019.
- Algorithmic Recourse: From Counterfactual Explanations to Interventions. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 353–362, New York, NY, USA, 2021. Association for Computing Machinery.
- Algorithmic recourse under imperfect causal knowledge: a probabilistic approach. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 265–277. Curran Associates, Inc., 2020.
- P. W. Koh and P. Liang. Understanding Black-Box Predictions via Influence Functions. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, page 1885–1894. JMLR.org, 2017.
- Captum: A unified and generic model interpretability library for PyTorch, 2020.
- Back Propagation is Sensitive to Initial Conditions. In Proceedings of the 1990 Conference on Advances in Neural Information Processing Systems 3, NIPS-3, page 860–867, San Francisco, CA, USA, 1990. Morgan Kaufmann Publishers Inc.
- A. V. Looveren and J. Klaise. Interpretable Counterfactual Explanations Guided by Prototypes. ArXiv, abs/1907.02584, 2019.
- A Unified Approach to Interpreting Model Predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- Individual differences among deep neural network models. Nature communications, 11(1):5725, November 2020.
- Norm-based capacity control in neural networks. In Conference on learning theory, pages 1376–1401. PMLR, 2015.
- Distributionally Robust Recourse Action, 2022.
- Robust Bayesian Recourse. In Uncertainty in Artificial Intelligence, pages 1498–1508. PMLR, 2022.
- Office of Science and Technology Policy. Blueprint for an AI Bill of Rights.
- Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis. In G. Camps-Valls, F. J. R. Ruiz, and I. Valera, editors, Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pages 4574–4594. PMLR, 28–30 Mar 2022.
- FACE: Feasible and Actionable Counterfactual Explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, AIES ’20, page 344–350, New York, NY, USA, 2020. Association for Computing Machinery.
- K. Rajarshi. Life Expectancy (WHO), 2017.
- Algorithmic Recourse in the Wild: Understanding the Impact of Data and Model Shifts, 2020.
- ”Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 1135–1144, New York, NY, USA, 2016. Association for Computing Machinery.
- Anchors: High-Precision Model-Agnostic Explanations. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI’18/IAAI’18/EAAI’18. AAAI Press, 2018.
- Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 618–626, 2017.
- Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. CoRR, abs/1312.6034, 2013.
- Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pages 180–186, 2020.
- SmoothGrad: removing noise by adding noise. In ICML Workshop on Visualization for Deep Learning, Sydney, Australia, 2017.
- Efficient Training of Low-Curvature Neural Networks. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022.
- Axiomatic Attribution for Deep Networks. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 3319–3328. PMLR, 06–11 Aug 2017.
- Towards Robust and Reliable Algorithmic Recourse. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 16926–16937. Curran Associates, Inc., 2021.
- Actionable Recourse in Linear Classification. In ACM Conference on Fairness, Accountability, and Transparency (FAT ’19), 2019.
- Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR. Harvard Journal of Law & Technology, 31:841–887, 04 2018.
- MACE: An Efficient Model-Agnostic Framework for Counterfactual Explanation. ArXiv, abs/2205.15540, 2022.