Robust Counterfactual Explanations for Neural Networks With Probabilistic Guarantees (2305.11997v3)
Abstract: There is an emerging interest in generating robust counterfactual explanations that would remain valid if the model is updated or changed even slightly. Towards finding robust counterfactuals, existing literature often assumes that the original model $m$ and the new model $M$ are bounded in the parameter space, i.e., $|\text{Params}(M){-}\text{Params}(m)|{<}\Delta$. However, models can often change significantly in the parameter space with little to no change in their predictions or accuracy on the given dataset. In this work, we introduce a mathematical abstraction termed $\textit{naturally-occurring}$ model change, which allows for arbitrary changes in the parameter space such that the change in predictions on points that lie on the data manifold is limited. Next, we propose a measure -- that we call $\textit{Stability}$ -- to quantify the robustness of counterfactuals to potential model changes for differentiable models, e.g., neural networks. Our main contribution is to show that counterfactuals with sufficiently high value of $\textit{Stability}$ as defined by our measure will remain valid after potential $\textit{naturally-occurring}$ model changes with high probability (leveraging concentration bounds for Lipschitz function of independent Gaussians). Since our quantification depends on the local Lipschitz constant around a data point which is not always available, we also examine practical relaxations of our proposed measure and demonstrate experimentally how they can be incorporated to find robust counterfactuals for neural networks that are close, realistic, and remain valid after potential model changes. This work also has interesting connections with model multiplicity, also known as, the Rashomon effect.
- Counterfactual shapley additive explanations. ACM Conference on Fairness, Accountability, and Transparency, 2022.
- On the robustness of interpretability methods. arXiv preprint arXiv:1806.08049, 2018.
- A simple proof of the restricted isometry property for random matrices. Constructive approximation, 28:253–263, 2008.
- The hidden assumptions behind counterfactual explanations and principal reasons. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 80–89, 2020.
- Consistent counterfactuals for deep models. arXiv preprint arXiv:2110.03109, 2021.
- Model multiplicity: Opportunities, concerns, and solutions. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22, pp. 850–863, 2022.
- Concentration inequalities: A nonasymptotic theory of independence. Oxford university press, 2013.
- Breiman, L. Statistical modeling: The two cultures. Quality Engineering, 48:81–82, 2001.
- Lof: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93–104, 2000.
- On the adversarial robustness of causal algorithmic recourse. In International Conference on Machine Learning, pp. 5324–5342. PMLR, 2022.
- UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
- Robust counterfactual explanations for tree-based ensembles. In International Conference on Machine Learning, pp. 5742–5756. PMLR, 2022.
- Robustness implies fairness in casual algorithmic recourse. arXiv preprint arXiv:2302.03465, 2023.
- FICO. FICO XML Challenge. https://community.fico.com/s/explainable-machine-learning-challenge, 2018.
- Equalizing recourse across groups. arXiv preprint arXiv:1909.03166, 2019.
- Hancox-Li, L. Robustness in Machine Learning Explanations: Does It Matter? In Proceedings of the 3rd ACM Conference on Fairness, Accountability, and Transparency (FAT*), pp. 640–647. Barcelona, Spain, January 27–30 2020.
- Rashomon capacity: A metric for predictive multiplicity in classification. In Advances in Neural Information Processing Systems, volume 35, pp. 28988–29000. Curran Associates, Inc., 2022.
- Formalising the robustness of counterfactual explanations for neural networks. arXiv preprint arXiv:2208.14878, 2022.
- Dace: Distribution-aware counterfactual explanation by mixed-integer linear optimization. In IJCAI, pp. 2855–2862, 2020.
- A survey of algorithmic recourse: definitions, formulations, solutions, and prospects. CoRR, abs/2010.04050, 2020.
- Issues with post-hoc counterfactual explanations: a discussion. arXiv preprint arXiv:1906.04774, 2019.
- Global counterfactual explanations: Investigations, implementations and improvements, 2022.
- Finding regions of counterfactual explanations via robust optimization. arXiv preprint arXiv:2301.11113, 2023.
- Predictive multiplicity in classification. In International Conference on Machine Learning, pp. 6765–6774. PMLR, 2020.
- A Survey on the Robustness of Feature Importance and Counterfactual Explanations. arXiv e-prints, arXiv:2111.00358, 2021.
- On counterfactual explanations under predictive multiplicity. In Conference on Uncertainty in Artificial Intelligence, pp. 809–818. PMLR, 2020a.
- Learning model-agnostic counterfactual explanations for tabular data. In Proceedings of The Web Conference 2020, pp. 3126–3132, 2020b.
- Probabilistically robust recourse: Navigating the trade-offs between costs and robustness in algorithmic recourse. arXiv preprint arXiv:2203.06768, 2022.
- Face: Feasible and actionable counterfactual explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pp. 344–350, 2020.
- Bayesian hierarchical models for counterfactual estimation. In International Conference on Artificial Intelligence and Statistics, pp. 1115–1128. PMLR, 2023.
- Can I still trust you?: Understanding the impact of distribution shifts on algorithmic recourses. arXiv preprint arXiv:2012.11788, 2020.
- Scikit-Learn. LOF Implementation. URL https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.LocalOutlierFactor.html.
- Certifai: Counterfactual explanations for robustness, transparency, interpretability, and fairness of artificial intelligence models. arXiv preprint arXiv:1905.07857, 2019.
- Towards robust and reliable algorithmic recourse. Advances in Neural Information Processing Systems, 34, 2021.
- Counterfactual explanations for machine learning: A review. arXiv preprint arXiv:2010.10596, 2020.
- Counterfactual explanations without opening the black box: Automated decisions and the gdpr. Harv. JL & Tech., 31:841, 2017.
- Predictive multiplicity in probabilistic classification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp. 10306–10314, 2023.
- The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2, Part 1):2473–2480, 2009.