Why You Should Not Trust Interpretations in Machine Learning: Adversarial Attacks on Partial Dependence Plots (2404.18702v2)
Abstract: The adoption of AI across industries has led to the widespread use of complex black-box models and interpretation tools for decision making. This paper proposes an adversarial framework to uncover the vulnerability of permutation-based interpretation methods for machine learning tasks, with a particular focus on partial dependence (PD) plots. This adversarial framework modifies the original black box model to manipulate its predictions for instances in the extrapolation domain. As a result, it produces deceptive PD plots that can conceal discriminatory behaviors while preserving most of the original model's predictions. This framework can produce multiple fooled PD plots via a single model. By using real-world datasets including an auto insurance claims dataset and COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) dataset, our results show that it is possible to intentionally hide the discriminatory behavior of a predictor and make the black-box model appear neutral through interpretation tools like PD plots while retaining almost all the predictions of the original black-box model. Managerial insights for regulators and practitioners are provided based on the findings.
- \APACrefYearMonthDay2023. \BBOQ\APACrefatitleA Visual Analytics Conceptual Framework for Explorable and Steerable Partial Dependence Analysis A visual analytics conceptual framework for explorable and steerable partial dependence analysis.\BBCQ \APACjournalVolNumPagesIEEE Transactions on Visualization and Computer Graphics. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2016. \BBOQ\APACrefatitleMachine bias Machine bias.\BBCQ \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020. \BBOQ\APACrefatitleVisualizing the effects of predictor variables in black box supervised learning models Visualizing the effects of predictor variables in black box supervised learning models.\BBCQ \APACjournalVolNumPagesJournal of the Royal Statistical Society Series B: Statistical Methodology8241059–1086. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022. \BBOQ\APACrefatitleManipulating SHAP via Adversarial Data Perturbations (Student Abstract) Manipulating shap via adversarial data perturbations (student abstract).\BBCQ \BIn \APACrefbtitleProceedings of the AAAI Conference on Artificial Intelligence Proceedings of the aaai conference on artificial intelligence (\BVOL 36, \BPGS 12907–12908). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \BBOQ\APACrefatitleAdversarial Attacks and Defenses in Explainable Artificial Intelligence: A Survey Adversarial attacks and defenses in explainable artificial intelligence: A survey.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2306.06123. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \BBOQ\APACrefatitleFooling partial dependence via data poisoning Fooling partial dependence via data poisoning.\BBCQ \BIn \APACrefbtitleMachine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part III Machine learning and knowledge discovery in databases: European conference, ecml pkdd 2022, grenoble, france, september 19–23, 2022, proceedings, part iii (\BPGS 121–136). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2013. \BBOQ\APACrefatitleStatistical procedures for forecasting criminal behavior: A comparative assessment Statistical procedures for forecasting criminal behavior: A comparative assessment.\BBCQ \APACjournalVolNumPagesCriminology & Pub. Pol’y12513. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022. \BBOQ\APACrefatitlePost-hoc explanations fail to achieve their purpose in adversarial contexts Post-hoc explanations fail to achieve their purpose in adversarial contexts.\BBCQ \BIn \APACrefbtitleProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency Proceedings of the 2022 acm conference on fairness, accountability, and transparency (\BPGS 891–905). \PrintBackRefs\CurrentBib
- \APACinsertmetastarbritton2019vine{APACrefauthors}Britton, M. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleVine: Visualizing statistical interactions in black box models Vine: Visualizing statistical interactions in black box models.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:1904.00561. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022. \BBOQ\APACrefatitleTransparency, auditability, and explainability of machine learning models in credit scoring Transparency, auditability, and explainability of machine learning models in credit scoring.\BBCQ \APACjournalVolNumPagesJournal of the Operational Research Society73170–90. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2016. \BBOQ\APACrefatitleUnderstanding variable effects from black box prediction: Quantifying effects in tree ensembles using partial dependence Understanding variable effects from black box prediction: Quantifying effects in tree ensembles using partial dependence.\BBCQ \APACjournalVolNumPagesJournal of Data Science14167–95. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2015. \BBOQ\APACrefatitleIntelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission.\BBCQ \BIn \APACrefbtitleProceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining Proceedings of the 21th acm sigkdd international conference on knowledge discovery and data mining (\BPGS 1721–1730). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022. \APACrefbtitleXgboost: extreme gradient boosting. R package version 1.6. 0.1. Xgboost: extreme gradient boosting. r package version 1.6. 0.1. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022. \BBOQ\APACrefatitleModel transparency and interpretability: survey and application to the insurance industry Model transparency and interpretability: survey and application to the insurance industry.\BBCQ \APACjournalVolNumPagesEuropean Actuarial Journal1–42. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020. \BBOQ\APACrefatitleYou shouldn’t trust me: Learning models which conceal unfairness from multiple explanation methods. You shouldn’t trust me: Learning models which conceal unfairness from multiple explanation methods.\BBCQ \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2019. \BBOQ\APACrefatitleExplanations can be manipulated and geometry is to blame Explanations can be manipulated and geometry is to blame.\BBCQ \APACjournalVolNumPagesAdvances in neural information processing systems32. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2019. \BBOQ\APACrefatitleCASdatasets: insurance datasets Casdatasets: insurance datasets.\BBCQ \APACjournalVolNumPagesR package version1–0. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020. \BBOQ\APACrefatitlePackage ‘CASdatasets’ Package ‘casdatasets’.\BBCQ \APACjournalVolNumPagesChristophe Dutang and Arthur Charpentier. \PrintBackRefs\CurrentBib
- \APACinsertmetastareiopa2021artificial{APACrefauthors}EIOPA. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleArtificial intelligence governance principle: Towards ethical and trustworthyartificial intelligence in the European insurance sector Artificial intelligence governance principle: Towards ethical and trustworthyartificial intelligence in the european insurance sector.\BBCQ \PrintBackRefs\CurrentBib
- \APACinsertmetastarfriedman2001greedy{APACrefauthors}Friedman, J\BPBIH. \APACrefYearMonthDay2001. \BBOQ\APACrefatitleGreedy function approximation: a gradient boosting machine Greedy function approximation: a gradient boosting machine.\BBCQ \APACjournalVolNumPagesAnnals of statistics1189–1232. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2019. \BBOQ\APACrefatitleInterpretation of neural networks is fragile Interpretation of neural networks is fragile.\BBCQ \BIn \APACrefbtitleProceedings of the AAAI conference on artificial intelligence Proceedings of the aaai conference on artificial intelligence (\BVOL 33, \BPGS 3681–3688). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \APACrefbtitleDeep Learning Tuning Playbook. Deep learning tuning playbook. {APACrefURL} http://github.com/google-research/tuning_playbook \APACrefnoteVersion 1.0 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2015. \BBOQ\APACrefatitlePeeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation.\BBCQ \APACjournalVolNumPagesjournal of Computational and Graphical Statistics24144–65. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2014. \BBOQ\APACrefatitleExplaining and harnessing adversarial examples Explaining and harnessing adversarial examples.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:1412.6572. \PrintBackRefs\CurrentBib
- \APACinsertmetastargromping2020model{APACrefauthors}Grömping, U. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleModel-agnostic effects plots for interpreting machine learning models Model-agnostic effects plots for interpreting machine learning models.\BBCQ \APACjournalVolNumPagesReports in Mathematics, Physics and Chemistry, Department II, Beuth University of Applied Sciences Berlin Report12020. \PrintBackRefs\CurrentBib
- \APACinsertmetastarguelman2012gradient{APACrefauthors}Guelman, L. \APACrefYearMonthDay2012. \BBOQ\APACrefatitleGradient boosting trees for auto insurance loss cost modeling and prediction Gradient boosting trees for auto insurance loss cost modeling and prediction.\BBCQ \APACjournalVolNumPagesExpert Systems with Applications3933659–3667. \PrintBackRefs\CurrentBib
- \APACrefYear2009. \APACrefbtitleThe elements of statistical learning: data mining, inference, and prediction The elements of statistical learning: data mining, inference, and prediction (\BVOL 2). \APACaddressPublisherSpringer. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022. \BBOQ\APACrefatitleDetection of Interacting Variables for Generalized Linear Models via Neural Networks Detection of interacting variables for generalized linear models via neural networks.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2209.08030v1. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021. \BBOQ\APACrefatitleBoosting insights in insurance tariff plans with tree-based machine learning methods Boosting insights in insurance tariff plans with tree-based machine learning methods.\BBCQ \APACjournalVolNumPagesNorth American Actuarial Journal252255–285. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2019. \BBOQ\APACrefatitleFooling neural network interpretations via adversarial model manipulation Fooling neural network interpretations via adversarial model manipulation.\BBCQ \APACjournalVolNumPagesAdvances in Neural Information Processing Systems32. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022. \BBOQ\APACrefatitleREPID: Regional Effect Plots with implicit Interaction Detection Repid: Regional effect plots with implicit interaction detection.\BBCQ \BIn \APACrefbtitleInternational Conference on Artificial Intelligence and Statistics International conference on artificial intelligence and statistics (\BPGS 10209–10233). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \BBOQ\APACrefatitleUnifying local and global model explanations by functional decomposition of low dimensional structures Unifying local and global model explanations by functional decomposition of low dimensional structures.\BBCQ \BIn \APACrefbtitleInternational Conference on Artificial Intelligence and Statistics International conference on artificial intelligence and statistics (\BPGS 7040–7060). \PrintBackRefs\CurrentBib
- \APACinsertmetastarhooker2004diagnosing{APACrefauthors}Hooker, G. \APACrefYearMonthDay2004. \BBOQ\APACrefatitleDiagnosing extrapolation: Tree-based density estimation Diagnosing extrapolation: Tree-based density estimation.\BBCQ \BIn \APACrefbtitleProceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining Proceedings of the tenth acm sigkdd international conference on knowledge discovery and data mining (\BPGS 569–574). \PrintBackRefs\CurrentBib
- \APACinsertmetastarhooker2007generalized{APACrefauthors}Hooker, G. \APACrefYearMonthDay2007. \BBOQ\APACrefatitleGeneralized functional anova diagnostics for high-dimensional functions of dependent variables Generalized functional anova diagnostics for high-dimensional functions of dependent variables.\BBCQ \APACjournalVolNumPagesJournal of Computational and Graphical Statistics163709–732. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021. \BBOQ\APACrefatitleUnrestricted permutation forces extrapolation: variable importance requires at least one more model, or there is no free variable importance Unrestricted permutation forces extrapolation: variable importance requires at least one more model, or there is no free variable importance.\BBCQ \APACjournalVolNumPagesStatistics and Computing311–16. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2012. \BBOQ\APACrefatitlePrediction-based regularization using data augmented regression Prediction-based regularization using data augmented regression.\BBCQ \APACjournalVolNumPagesStatistics and Computing221237–249. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2018. \BBOQ\APACrefatitleNonconvex optimization for regression with fairness constraints Nonconvex optimization for regression with fairness constraints.\BBCQ \BIn \APACrefbtitleInternational conference on machine learning International conference on machine learning (\BPGS 2737–2746). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2016. \BBOQ\APACrefatitleInteracting with predictions: Visual inspection of black-box machine learning models Interacting with predictions: Visual inspection of black-box machine learning models.\BBCQ \BIn \APACrefbtitleProceedings of the 2016 CHI conference on human factors in computing systems Proceedings of the 2016 chi conference on human factors in computing systems (\BPGS 5686–5697). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020. \BBOQ\APACrefatitleTowards explainability of machine learning models in insurance pricing Towards explainability of machine learning models in insurance pricing.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2003.10674. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022. \BBOQ\APACrefatitleFooling SHAP with Stealthily Biased Sampling Fooling shap with stealthily biased sampling.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2205.15419. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2018. \BBOQ\APACrefatitleDelta boosting machine with application to general insurance Delta boosting machine with application to general insurance.\BBCQ \APACjournalVolNumPagesNorth American Actuarial Journal223405–425. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2006. \BBOQ\APACrefatitleBagging and boosting classification trees to predict churn Bagging and boosting classification trees to predict churn.\BBCQ \APACjournalVolNumPagesJournal of Marketing Research432276–286. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \BBOQ\APACrefatitleCausal Dependence Plots for Interpretable Machine Learning Causal dependence plots for interpretable machine learning.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2303.04209. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2013. \BBOQ\APACrefatitleAccurate intelligible models with pairwise interactions Accurate intelligible models with pairwise interactions.\BBCQ \BIn \APACrefbtitleProceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Proceedings of the 19th acm sigkdd international conference on knowledge discovery and data mining (\BPGS 623–631). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2017. \BBOQ\APACrefatitleA unified approach to interpreting model predictions A unified approach to interpreting model predictions.\BBCQ \APACjournalVolNumPagesAdvances in neural information processing systems30. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022. \BBOQ\APACrefatitleCustomer Churn in Retail E-Commerce Business: Spatial and Machine Learning Approach Customer churn in retail e-commerce business: Spatial and machine learning approach.\BBCQ \APACjournalVolNumPagesJournal of Theoretical and Applied Electronic Commerce Research171165–198. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022. \BBOQ\APACrefatitleA comprehensive study of explainable artificial intelligence in healthcare A comprehensive study of explainable artificial intelligence in healthcare.\BBCQ \BIn \APACrefbtitleAugmented Intelligence in Healthcare: A Pragmatic and Integrated Analysis Augmented intelligence in healthcare: A pragmatic and integrated analysis (\BPGS 475–502). \APACaddressPublisherSpringer. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021. \BBOQ\APACrefatitleRelating the partial dependence plot and permutation feature importance to the data generating process Relating the partial dependence plot and permutation feature importance to the data generating process.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2109.01433. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022. \BBOQ\APACrefatitleGeneral pitfalls of model-agnostic interpretation methods for machine learning models General pitfalls of model-agnostic interpretation methods for machine learning models.\BBCQ \BIn \APACrefbtitlexxAI-Beyond Explainable AI: International Workshop, Held in Conjunction with ICML 2020, July 18, 2020, Vienna, Austria, Revised and Extended Papers xxai-beyond explainable ai: International workshop, held in conjunction with icml 2020, july 18, 2020, vienna, austria, revised and extended papers (\BPGS 39–68). \PrintBackRefs\CurrentBib
- \APACinsertmetastarNAIC2020regulatory{APACrefauthors}NAIC. \APACrefYearMonthDay2020. \APACrefbtitleCasualty Actuarial and Statistical (C) Task Force - Regulatory Review of Predictive Models White Paper. Casualty actuarial and statistical (c) task force - regulatory review of predictive models white paper. \APAChowpublishedhttps://content.naic.org/sites/default/files/CA-WP_1.pdf. \PrintBackRefs\CurrentBib
- \APACinsertmetastarNAIC2022trees{APACrefauthors}NAIC. \APACrefYearMonthDay2022. \APACrefbtitleAppendix B-Trees – Information Elements and Guidance for a Regulator to Meet Best Practices’ Objectives (When Reviewing Tree-Based Models). Appendix b-trees – information elements and guidance for a regulator to meet best practices’ objectives (when reviewing tree-based models). \APAChowpublishedhttps://content.naic.org/sites/default/files/inline-files/CASTF%20Tree-based%20Model%20Appendix%20%28B-Trees%29.pdf. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \BBOQ\APACrefatitleThe role of explainable AI in the context of the AI Act The role of explainable ai in the context of the ai act.\BBCQ \BIn \APACrefbtitleProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency Proceedings of the 2023 acm conference on fairness, accountability, and transparency (\BPGS 1139–1150). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2016. \BBOQ\APACrefatitle“Why should I trust you?” Explaining the predictions of any classifier “Why should I trust you?” Explaining the predictions of any classifier.\BBCQ \BIn \APACrefbtitleProceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (\BPGS 1135–1144). \PrintBackRefs\CurrentBib
- \APACinsertmetastarrudin2019stop{APACrefauthors}Rudin, C. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleStop explaining black box machine learning models for high stakes decisions and use interpretable models instead Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.\BBCQ \APACjournalVolNumPagesNature Machine Intelligence15206–215. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021. \BBOQ\APACrefatitleCounterfactual explanations can be manipulated Counterfactual explanations can be manipulated.\BBCQ \APACjournalVolNumPagesAdvances in neural information processing systems3462–75. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020. \BBOQ\APACrefatitleFooling lime and shap: Adversarial attacks on post hoc explanation methods Fooling lime and shap: Adversarial attacks on post hoc explanation methods.\BBCQ \BIn \APACrefbtitleProceedings of the AAAI/ACM Conference on AI, Ethics, and Society Proceedings of the aaai/acm conference on ai, ethics, and society (\BPGS 180–186). \PrintBackRefs\CurrentBib
- \APACinsertmetastarSOA2021interpretable{APACrefauthors}SOA. \APACrefYearMonthDay2021. \APACrefbtitleInterpretable Machine Learning for Insurance. Interpretable machine learning for insurance. \APAChowpublishedhttps://www.soa.org/globalassets/assets/files/resources/research-report/2021/interpretable-machine-learning.pdf. \PrintBackRefs\CurrentBib
- \APACinsertmetastarHAI2019indexreport{APACrefauthors}Stanford HAI. \APACrefYearMonthDay2019. \APACrefbtitleArtificial Intelligence Index Report 2019. Artificial intelligence index report 2019. \APAChowpublishedhttps://hai.stanford.edu/sites/default/files/ai_index_2019_report.pdf. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2013. \BBOQ\APACrefatitleIntriguing properties of neural networks Intriguing properties of neural networks.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:1312.6199. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \BBOQ\APACrefatitleHow much do we see? On the explainability of partial dependence plots for credit risk scoring How much do we see? on the explainability of partial dependence plots for credit risk scoring.\BBCQ \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2017. \BBOQ\APACrefatitleWhy a right to explanation of automated decision-making does not exist in the general data protection regulation Why a right to explanation of automated decision-making does not exist in the general data protection regulation.\BBCQ \APACjournalVolNumPagesInternational Data Privacy Law7276–99. \PrintBackRefs\CurrentBib
- \APACinsertmetastarxie2021improving{APACrefauthors}Xie, S. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleImproving explainability of major risk factors in artificial neural networks for auto insurance rate regulation Improving explainability of major risk factors in artificial neural networks for auto insurance rate regulation.\BBCQ \APACjournalVolNumPagesRisks97126. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2018. \BBOQ\APACrefatitleInsurance premium prediction via gradient tree-boosted Tweedie compound Poisson models Insurance premium prediction via gradient tree-boosted tweedie compound poisson models.\BBCQ \APACjournalVolNumPagesJournal of Business & Economic Statistics363456–470. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021. \BBOQ\APACrefatitleInterpreting the black box of supervised learning models: Visualizing the impacts of features on prediction Interpreting the black box of supervised learning models: Visualizing the impacts of features on prediction.\BBCQ \APACjournalVolNumPagesApplied Intelligence51107151–7165. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021. \BBOQ\APACrefatitleCausal interpretations of black-box models Causal interpretations of black-box models.\BBCQ \APACjournalVolNumPagesJournal of Business & Economic Statistics391272–281. \PrintBackRefs\CurrentBib