Promoting Counterfactual Robustness through Diversity (2312.06564v2)
Abstract: Counterfactual explanations shed light on the decisions of black-box models by explaining how an input can be altered to obtain a favourable decision from the model (e.g., when a loan application has been rejected). However, as noted recently, counterfactual explainers may lack robustness in the sense that a minor change in the input can cause a major change in the explanation. This can cause confusion on the user side and open the door for adversarial attacks. In this paper, we study some sources of non-robustness. While there are fundamental reasons for why an explainer that returns a single counterfactual cannot be robust in all instances, we show that some interesting robustness guarantees can be given by reporting multiple rather than a single counterfactual. Unfortunately, the number of counterfactuals that need to be reported for the theoretical guarantees to hold can be prohibitively large. We therefore propose an approximation algorithm that uses a diversity criterion to select a feasible number of most relevant explanations and study its robustness empirically. Our experiments indicate that our method improves the state-of-the-art in generating robust explanations, while maintaining other desirable properties and providing competitive computational performance.
- Consistent Counterfactuals for Deep Models. In Proceedings of the International Conference on Learning Representations (ICLR’22). OpenReview.net.
- UCI Machine Learning Repository. http://archive.ics.uci.edu/ml. Accessed: 2022-08-30.
- Robust Counterfactual Explanations for Tree-Based Ensembles. In Proceedings of the International Conference on Machine Learning (ICML’22), volume 162, 5742–5756. PMLR.
- Online News Popularity. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5NS3V.
- FICO Community. 2019. Explainable Machine Learning Challenge. https://community.fico.com/s/explainable-machine-learning-challenge.
- Attribution-based Explanations that Provide Recourse Cannot be Robust. arXiv preprint arXiv:2205.15834.
- A Survey of Methods for Explaining Black Box Models. ACM Comput. Surv., 51(5): 93:1–93:42.
- Hancox-Li, L. 2020. Robustness in machine learning explanations: does it matter? In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAT*’20), 640–647. ACM.
- Spambase. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C53G6X.
- Formalising the Robustness of Counterfactual Explanations for Neural Networks. In Procedings of the 37th AAAI Conference on Artificial Intelligence (AAAI’23), 14901–14909. AAAI Press.
- Model-Agnostic Counterfactual Explanations for Consequential Decisions. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS’20), 895–905.
- A Survey of Algorithmic Recourse: Contrastive Explanations and Consequential Recommendations. ACM Comput. Surv., 55(5): 95:1–95:29.
- Counterfactual Explanations and Model Multiplicity: a Relational Verification View. In Proceedings of the 20th International Conference on Principles of Knowledge Representation and Reasoning (KR’23), 763–768.
- Towards Robust Contrastive Explanations for Human-Neural Multi-Agent Systems. In Proceedings of the 22nd International Conference on Autonomous Agents and Multiagent Systems (AAMAS’23), 2343–2345.
- Scaling Guarantees for Nearest Counterfactual Explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES’21 )., 177–187. ACM.
- Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAT*’20)., 607–617.
- On Counterfactual Explanations under Predictive Multiplicity. In Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI’20), volume 124 of Proceedings of Machine Learning Research, 809–818. AUAI Press.
- Probabilistically Robust Recourse: Navigating the Trade-offs between Costs and Robustness in Algorithmic Recourse. In Proceedings of the 11th International Conference on Learning Representations, (ICLR’23). OpenReview.net.
- ProPublica. 2016. How We Analyzed the COMPAS Recidivism Algorithm . https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm.
- Counterfactual Explanations Can Be Manipulated. In Advances in Neural Information Processing Systems 34 (NeurIPS’21), 62–75.
- Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the annual symposium on computer application in medical care, 261. American Medical Informatics Association.
- A Survey of Contrastive and Counterfactual Explanation Generation Methods for Explainable Artificial Intelligence. IEEE Access, 9: 11974–12001.
- Towards Robust and Reliable Algorithmic Recourse. In Advances in Neural Information Processing Systems 34 (NeurIPS’21), 16926–16937.
- Interpretable Counterfactual Explanations Guided by Prototypes. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD’21), 650–665.
- OpenML: networked science in machine learning. SIGKDD Explor., 15(2): 49–60.
- Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR. Harv. JL & Tech., 31: 841.
- Francesco Leofante (16 papers)
- Nico Potyka (27 papers)