Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Trade-offs between Adversarial Robustness and Actionable Explanations (2309.16452v2)

Published 28 Sep 2023 in cs.LG

Abstract: As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders. However, it is unclear if these two notions can be simultaneously achieved or if there exist trade-offs between them. In this work, we make one of the first attempts at studying the impact of adversarially robust models on actionable explanations which provide end users with a means for recourse. We theoretically and empirically analyze the cost (ease of implementation) and validity (probability of obtaining a positive model prediction) of recourses output by state-of-the-art algorithms when the underlying models are adversarially robust vs. non-robust. More specifically, we derive theoretical bounds on the differences between the cost and the validity of the recourses generated by state-of-the-art algorithms for adversarially robust vs. non-robust linear and non-linear models. Our empirical results with multiple real-world datasets validate our theoretical results and show the impact of varying degrees of model robustness on the cost and validity of the resulting recourses. Our analyses demonstrate that adversarially robust models significantly increase the cost and reduce the validity of the resulting recourses, thus shedding light on the inherent trade-offs between adversarial robustness and actionable explanations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Improving robustness to adversarial examples by encouraging discriminative features. In ICIP. IEEE, 2019.
  2. Compressed sensing using generative models. In ICML. PMLR, 2017.
  3. Adversarial attacks and defences: A survey. arXiv preprint arXiv:1810.00069, 2018.
  4. Houdini: Fooling deep structured prediction models. arXiv preprint arXiv:1707.05373, 2017.
  5. On the adversarial robustness of causal algorithmic recourse. arXiv:2112.11313, 2021.
  6. Gradient descent provably optimizes over-parameterized neural networks. In ICLR, 2019.
  7. UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
  8. GDPR. Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec (general data protection regulation) (text with eea relevance), May 2016.
  9. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  10. Robustness and explainability of artificial intelligence. Publications Office of the European Union, 2020.
  11. The effect of race/ethnicity on sentencing: Examining sentence type, jail length, and prison length. Journal of Ethnicity in Criminal Justice, 13(3):179–196, 2015. doi: 10.1080/15377938.2014.984045. URL https://doi.org/10.1080/15377938.2014.984045.
  12. Model-agnostic counterfactual explanations for consequential decisions. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2020a.
  13. A survey of algorithmic recourse: definitions, formulations, solutions, and prospects. arXiv:2010.04050, 2020b.
  14. Algorithmic recourse under imperfect causal knowledge: a probabilistic approach. In Conference on Neural Information Processing Systems (NeurIPS), 2020c.
  15. Adversarial robustness - theory and practice. https://adversarial-ml-tutorial.org/linear_models/, 2023.
  16. Adversarial examples in the physical world, 2016.
  17. Inverse classification for comparison-based interpretability in machine learning. arXiv preprint arXiv:1712.08443, 2017.
  18. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
  19. Preserving causal constraints in counterfactual explanations for machine learning classifiers. arXiv preprint arXiv:1912.03277, 2019.
  20. Learning model-agnostic counterfactual explanations for tabular data. In Proceedings of The Web Conference 2020, 2020.
  21. Exploring counterfactual explanations through the lens of adversarial examples: A theoretical and empirical analysis. In AISTATS, 2022a.
  22. Algorithmic recourse in the face of noisy human responses. arXiv preprint arXiv:2203.06768, 2022b.
  23. On the trade-off between actionable explanations and the right to be forgotten. arXiv, 2022c.
  24. Algorithmic recourse in the wild: Understanding the impact of data and model shifts. arXiv:2012.11788, 2021.
  25. Do input gradients highlight discriminative features? Advances in Neural Information Processing Systems, 34:2046–2059, 2021.
  26. Counterfactual explanations can be manipulated. In Advances in Neural Information Processing Systems (NeurIPS), volume 34, 2021.
  27. Intriguing properties of neural networks. arXiv, 2013.
  28. Towards robust and reliable algorithmic recourse. In Advances in Neural Information Processing Systems (NeurIPS), volume 34, 2021.
  29. Actionable recourse in linear classification. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT*), 2019.
  30. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  31. Interpretable counterfactual explanations guided by prototypes. arXiv preprint arXiv:1907.02584, 2019.
  32. The philosophical basis of algorithmic recourse. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT*), New York, NY, USA, 2020. ACM.
  33. Counterfactual explanations for machine learning: A review. arXiv:2010.10596, 2020.
  34. Paul Voigt and Axel Von dem Bussche. The eu general data protection regulation (gdpr). A Practical Guide, 1st Ed., Cham: Springer International Publishing, 10(3152676):10–5555, 2017.
  35. Counterfactual explanations without opening the black box: automated decisions and the gdpr. Harvard Journal of Law & Technology, 31(2), 2018.
  36. The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. In Expert Systems with Applications, 2009.
  37. Rethinking influence functions of neural networks in the over-parameterized regime. In AAAI, 2022.
  38. Generating natural adversarial examples. arXiv preprint arXiv:1710.11342, 2017.

Summary

We haven't generated a summary for this paper yet.