Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Manipulation Risks in Explainable AI: The Implications of the Disagreement Problem (2306.13885v2)

Published 24 Jun 2023 in cs.AI

Abstract: AI systems are increasingly used in high-stakes domains of our life, increasing the need to explain these decisions and to make sure that they are aligned with how we want the decision to be made. The field of Explainable AI (XAI) has emerged in response. However, it faces a significant challenge known as the disagreement problem, where multiple explanations are possible for the same AI decision or prediction. While the existence of the disagreement problem is acknowledged, the potential implications associated with this problem have not yet been widely studied. First, we provide an overview of the different strategies explanation providers could deploy to adapt the returned explanation to their benefit. We make a distinction between strategies that attack the machine learning model or underlying data to influence the explanations, and strategies that leverage the explanation phase directly. Next, we analyse several objectives and concrete scenarios the providers could have to engage in this behavior, and the potential dangerous consequences this manipulative behavior could have on society. We emphasize that it is crucial to investigate this issue now, before these methods are widely implemented, and propose some mitigation strategies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Richard Berk. Criminal justice forecasts of risk: A machine learning approach. Springer Science & Business Media, 2012.
  2. Machine learning in healthcare. In Key Advances in Clinical Informatics, pages 279–291. Elsevier, 2017.
  3. Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1):124–136, 2015.
  4. Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai. Information fusion, 58:82–115, 2020.
  5. Explainable AI: interpreting, explaining and visualizing deep learning, volume 11700. Springer Nature, 2019.
  6. European union regulations on algorithmic decision-making and a “right to explanation”. AI magazine, 38(3):50–57, 2017.
  7. Peeking inside the black-box: a survey on explainable artificial intelligence (xai). IEEE access, 6:52138–52160, 2018.
  8. Explainable ai methods-a brief overview. In xxAI-Beyond Explainable AI: International Workshop, Held in Conjunction with ICML 2020, July 18, 2020, Vienna, Austria, Revised and Extended Papers, pages 13–38. Springer, 2022.
  9. Explainable ai: A brief survey on history, research areas, approaches and challenges. In Natural Language Processing and Chinese Computing: 8th CCF International Conference, NLPCC 2019, Dunhuang, China, October 9–14, 2019, Proceedings, Part II 8, pages 563–574. Springer, 2019.
  10. Disagreement amongst counterfactual explanations: How transparency can be deceptive. arXiv preprint arXiv:2304.12667, 2023.
  11. The disagreement problem in explainable machine learning: A practitioner’s perspective. arXiv preprint arXiv:2202.01602, 2022.
  12. Order in the court: Explainable ai methods prone to disagreement. arXiv preprint arXiv:2105.03287, 2021.
  13. Why don’t xai techniques agree? characterizing the disagreements between post-hoc explanations of defect predictions. In 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 444–448. IEEE, 2022.
  14. The hidden assumptions behind counterfactual explanations and principal reasons. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pages 80–89, 2020.
  15. Fairwashing: the risk of rationalization. In International Conference on Machine Learning, pages 161–170. PMLR, 2019.
  16. Post-hoc explanations fail to achieve their purpose in adversarial contexts. In 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 891–905, 2022.
  17. Christoph Molnar. Interpretable machine learning. Lulu. com, 2020.
  18. The non-linear nature of the cost of comprehensibility. Journal of Big Data, 9(1):30, 2022.
  19. Comprehensible credit scoring models using rule extraction from support vector machines. European journal of operational research, 183(3):1466–1476, 2007.
  20. Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001.
  21. A unified approach to interpreting model predictions. In Proceedings of the 31st international conference on neural information processing systems, pages 4768–4777, 2017.
  22. " why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016.
  23. Explaining data-driven document classifications. MIS quarterly, 38(1):73–100, 2014.
  24. Counterfactual explanations without opening the black box: Automated decisions and the gdpr. Harv. JL & Tech., 31:841, 2017.
  25. Riccardo Guidotti. Counterfactual explanations and how to find them: literature review and benchmarking. Data Mining and Knowledge Discovery, pages 1–55, 2022.
  26. Monetizing explainable ai: A double-edged sword. arXiv preprint arXiv:2304.06483, 2023.
  27. The many shapley values for model explanation. In International conference on machine learning, pages 9269–9278. PMLR, 2020.
  28. Precof: counterfactual explanations for fairness. Machine Learning, pages 1–32, 2023.
  29. Investigating the intelligibility of plural counterfactual examples for non-expert users: an explanation user interface proposition and user study. In Proceedings of the 28th International Conference on Intelligent User Interfaces, pages 188–203, 2023.
  30. Fooling partial dependence via data poisoning. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part III, pages 121–136. Springer, 2023.
  31. Fooling lime and shap: Adversarial attacks on post hoc explanation methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pages 180–186, 2020.
  32. You shouldn’t trust me: Learning models which conceal unfairness from multiple explanation methods. 2020.
  33. Fooling neural network interpretations via adversarial model manipulation. Advances in Neural Information Processing Systems, 32, 2019.
  34. Explanations can be manipulated and geometry is to blame. Advances in neural information processing systems, 32, 2019.
  35. Counterfactual explanations can be manipulated. Advances in neural information processing systems, 34:62–75, 2021.
  36. Explaining the explainer: A first theoretical analysis of lime. In International Conference on Artificial Intelligence and Statistics, pages 1287–1296. PMLR, 2020.
  37. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pages 607–617, 2020.
  38. Developing the sensitivity of lime for better machine learning explanation. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, volume 11006, pages 349–356. SPIE, 2019.
  39. " why should you trust my explanation?" understanding uncertainty in lime explanations. arXiv preprint arXiv:1904.12991, 2019.
  40. Raphael Mazzine Barbosa de Oliveira and David Martens. A framework and benchmarking study for counterfactual generating methods on tabular data. Applied Sciences, 11(16):7274, 2021.
  41. " explanation" is not a technical term: The problem of ambiguity in xai. arXiv preprint arXiv:2207.00007, 2022.
  42. Characterizing the risk of fairwashing. Advances in Neural Information Processing Systems, 34:14822–14834, 2021.
  43. Washing the unwashable: On the (im) possibility of fairwashing detection. Advances in Neural Information Processing Systems, 35:14170–14182, 2022.
  44. Helen Nissenbaum. Accountability in a computerized society. Science and engineering ethics, 2:25–42, 1996.
  45. A survey on computational propaganda detection. arXiv preprint arXiv:2007.08024, 2020.
  46. On the fairness of causal algorithmic recourse. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 9584–9594, 2022.
  47. Certifai: A common framework to provide explanations and analyse the fairness and robustness of black-box models. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pages 166–172, 2020.
  48. Epistemic fragmentation poses a threat to the governance of online targeting. Nature Machine Intelligence, 3(6):466–472, 2021.
  49. Barriers to academic data science research in the new realm of algorithmic behaviour modification by digital platforms. Nature Machine Intelligence, 4(4):323–330, 2022.
  50. What would you ask the machine learning model? identification of user needs for model explanations based on human-model conversations. In ECML PKDD 2020 Workshops: Workshops of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2020): SoGood 2020, PDFL 2020, MLCS 2020, NFMCP 2020, DINA 2020, EDML 2020, XKDD 2020 and INRA 2020, Ghent, Belgium, September 14–18, 2020, Proceedings, pages 447–459. Springer, 2020.
  51. How to choose an explainability method? towards a methodical implementation of xai in practice. In Machine Learning and Principles and Practice of Knowledge Discovery in Databases: International Workshops of ECML PKDD 2021, Virtual Event, September 13-17, 2021, Proceedings, Part I, pages 521–533. Springer, 2022.
  52. Deepening democracy: Innovations in empowered participatory governance. Politics & society, 29(1):5–41, 2001.
Citations (3)

Summary

We haven't generated a summary for this paper yet.