SoK: Taming the Triangle -- On the Interplays between Fairness, Interpretability and Privacy in Machine Learning (2312.16191v1)
Abstract: Machine learning techniques are increasingly used for high-stakes decision-making, such as college admissions, loan attribution or recidivism prediction. Thus, it is crucial to ensure that the models learnt can be audited or understood by human users, do not create or reproduce discrimination or bias, and do not leak sensitive information regarding their training data. Indeed, interpretability, fairness and privacy are key requirements for the development of responsible machine learning, and all three have been studied extensively during the last decade. However, they were mainly considered in isolation, while in practice they interplay with each other, either positively or negatively. In this Systematization of Knowledge (SoK) paper, we survey the literature on the interactions between these three desiderata. More precisely, for each pairwise interaction, we summarize the identified synergies and tensions. These findings highlight several fundamental theoretical and empirical conflicts, while also demonstrating that jointly considering these different requirements is challenging when one aims at preserving a high level of utility. To solve this issue, we also discuss possible conciliation mechanisms, showing that a careful design can enable to successfully handle these different concerns in practice.
- P. Voigt and A. Von dem Bussche, “The eu general data protection regulation (gdpr),” A Practical Guide, 1st Ed., Cham: Springer International Publishing, vol. 10, no. 3152676, pp. 10–5555, 2017.
- N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Galstyan, “A survey on bias and fairness in machine learning,” ACM Comput. Surv., vol. 54, no. 6, pp. 115:1–115:35, 2022. [Online]. Available: https://doi.org/10.1145/3457607
- E. D. Cristofaro, “An overview of privacy in machine learning,” CoRR, vol. abs/2005.08679, 2020. [Online]. Available: https://arxiv.org/abs/2005.08679
- R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, and D. Pedreschi, “A survey of methods for explaining black box models,” ACM computing surveys (CSUR), vol. 51, no. 5, pp. 1–42, 2018.
- T. Datta, D. Nissani, M. Cembalest, A. Khanna, H. Massa, and J. P. Dickerson, “Position: Tensions between the proxies of human values in AI,” in First IEEE Conference on Secure and Trustworthy Machine Learning, 2023. [Online]. Available: https://openreview.net/forum?id=7EjikkMkIl
- F. Fioretto, C. Tran, P. V. Hentenryck, and K. Zhu, “Differential privacy and fairness in decisions and learning tasks: A survey,” in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, L. D. Raedt, Ed. ijcai.org, 2022, pp. 5470–5477. [Online]. Available: https://doi.org/10.24963/ijcai.2022/766
- J. Schöffer, “On the interplay of transparency and fairness in ai-informed decision-making,” 2023.
- S. Verma and J. Rubin, “Fairness definitions explained,” in Proceedings of the International Workshop on Software Fairness, FairWare@ICSE 2018, Gothenburg, Sweden, May 29, 2018, Y. Brun, B. Johnson, and A. Meliou, Eds. ACM, 2018, pp. 1–7. [Online]. Available: https://doi.org/10.1145/3194770.3194776
- C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel, “Fairness through awareness,” in Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, ser. ITCS ’12. New York, NY, USA: Association for Computing Machinery, 2012, p. 214–226.
- N. Kilbertus, M. Rojas-Carulla, G. Parascandolo, M. Hardt, D. Janzing, and B. Schölkopf, “Avoiding discrimination through causal reasoning,” in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett, Eds., 2017, pp. 656–666. [Online]. Available: https://proceedings.neurips.cc/paper/2017/hash/f5f8590cd58a54e94377e6ae2eded4d9-Abstract.html
- R. K. E. Bellamy, K. Dey, M. Hind, S. C. Hoffman, S. Houde, K. Kannan, P. Lohia, J. Martino, S. Mehta, A. Mojsilovic, S. Nagar, K. N. Ramamurthy, J. T. Richards, D. Saha, P. Sattigeri, M. Singh, K. R. Varshney, and Y. Zhang, “AI fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias,” IBM J. Res. Dev., vol. 63, no. 4/5, pp. 4:1–4:15, 2019. [Online]. Available: https://doi.org/10.1147/JRD.2019.2942287
- S. A. Friedler, C. Scheidegger, S. Venkatasubramanian, S. Choudhary, E. P. Hamilton, and D. Roth, “A comparative study of fairness-enhancing interventions in machine learning,” in Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, Atlanta, GA, USA, January 29-31, 2019, danah boyd and J. H. Morgenstern, Eds. ACM, 2019, pp. 329–338. [Online]. Available: https://doi.org/10.1145/3287560.3287589
- S. Caton and C. Haas, “Fairness in machine learning: A survey,” ACM Comput. Surv., aug 2023. [Online]. Available: https://doi.org/10.1145/3616865
- P. W. Koh and P. Liang, “Understanding black-box predictions via influence functions,” in International conference on machine learning. PMLR, 2017, pp. 1885–1894.
- R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 618–626.
- Z. C. Lipton, “The mythos of model interpretability,” Queue, vol. 16, no. 3, pp. 31–57, 2018.
- C. Dwork, A. Smith, T. Steinke, and J. Ullman, “Exposed! a survey of attacks on private data,” Annual Review of Statistics and Its Application, vol. 4, no. 1, pp. 61–84, 2017. [Online]. Available: https://doi.org/10.1146/annurev-statistics-060116-054123
- M. Rigaki and S. Garcia, “A survey of privacy attacks in machine learning,” ACM Comput. Surv., sep 2023. [Online]. Available: https://doi.org/10.1145/3624010
- R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership inference attacks against machine learning models,” in 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017. IEEE Computer Society, 2017, pp. 3–18. [Online]. Available: https://doi.org/10.1109/SP.2017.41
- F. Tramèr, F. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart, “Stealing machine learning models via prediction apis,” in 25th USENIX Security Symposium, USENIX Security 16, Austin, TX, USA, August 10-12, 2016, T. Holz and S. Savage, Eds. USENIX Association, 2016, pp. 601–618. [Online]. Available: https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/tramer
- M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks that exploit confidence information and basic countermeasures,” in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, October 12-16, 2015, I. Ray, N. Li, and C. Kruegel, Eds. ACM, 2015, pp. 1322–1333. [Online]. Available: https://doi.org/10.1145/2810103.2813677
- C. Clifton and T. Tassa, “On syntactic anonymity and differential privacy,” Trans. Data Priv., vol. 6, no. 2, pp. 161–183, 2013. [Online]. Available: http://www.tdp.cat/issues11/abs.a124a13.php
- L. Sweeney, “k-anonymity: A model for protecting privacy,” Int. J. Uncertain. Fuzziness Knowl. Based Syst., vol. 10, no. 5, pp. 557–570, 2002. [Online]. Available: https://doi.org/10.1142/S0218488502001648
- P. Samarati, “Protecting respondents’ identities in microdata release,” IEEE Trans. Knowl. Data Eng., vol. 13, no. 6, pp. 1010–1027, 2001. [Online]. Available: https://doi.org/10.1109/69.971193
- N. Li, T. Li, and S. Venkatasubramanian, “t-closeness: Privacy beyond k-anonymity and l-diversity,” in Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, April 15-20, 2007, R. Chirkova, A. Dogac, M. T. Özsu, and T. K. Sellis, Eds. IEEE Computer Society, 2007, pp. 106–115. [Online]. Available: https://doi.org/10.1109/ICDE.2007.367856
- C. Dwork, F. McSherry, K. Nissim, and A. D. Smith, “Calibrating noise to sensitivity in private data analysis,” in Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, ser. Lecture Notes in Computer Science, S. Halevi and T. Rabin, Eds., vol. 3876. Springer, 2006, pp. 265–284. [Online]. Available: https://doi.org/10.1007/11681878\_14
- C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,” Found. Trends Theor. Comput. Sci., vol. 9, no. 3–4, p. 211–407, aug 2014. [Online]. Available: https://doi.org/10.1561/0400000042
- J. Zhang, Z. Zhang, X. Xiao, Y. Yang, and M. Winslett, “Functional mechanism: Regression analysis under differential privacy,” Proc. VLDB Endow., vol. 5, no. 11, pp. 1364–1375, 2012. [Online]. Available: http://vldb.org/pvldb/vol5/p1364\_junzhang\_vldb2012.pdf
- F. McSherry and K. Talwar, “Mechanism design via differential privacy,” in 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007), October 20-23, 2007, Providence, RI, USA, Proceedings. IEEE Computer Society, 2007, pp. 94–103. [Online]. Available: https://doi.org/10.1109/FOCS.2007.41
- Z. Ji, Z. C. Lipton, and C. Elkan, “Differential privacy and machine learning: a survey and review,” CoRR, vol. abs/1412.7584, 2014. [Online]. Available: http://arxiv.org/abs/1412.7584
- M. Gong, Y. Xie, K. Pan, K. Feng, and A. K. Qin, “A survey on differentially private machine learning [review article],” IEEE Comput. Intell. Mag., vol. 15, no. 2, pp. 49–64, 2020. [Online]. Available: https://doi.org/10.1109/MCI.2020.2976185
- M. Abadi, A. Chu, I. J. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016, E. R. Weippl, S. Katzenbeisser, C. Kruegel, A. C. Myers, and S. Halevi, Eds. ACM, 2016, pp. 308–318. [Online]. Available: https://doi.org/10.1145/2976749.2978318
- N. Papernot, M. Abadi, Ú. Erlingsson, I. J. Goodfellow, and K. Talwar, “Semi-supervised knowledge transfer for deep learning from private training data,” in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017. [Online]. Available: https://openreview.net/forum?id=HkwoSDPgg
- N. Papernot, S. Song, I. Mironov, A. Raghunathan, K. Talwar, and Ú. Erlingsson, “Scalable private learning with PATE,” in 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. [Online]. Available: https://openreview.net/forum?id=rkZB1XbRZ
- S. Agarwal, “Trade-offs between fairness and interpretability in machine learning,” in IJCAI 2021 Workshop on AI for Social Good, 2021.
- J. Kleinberg and S. Mullainathan, “Simplicity creates inequity: implications for fairness, stereotypes, and interpretability,” in Proceedings of the 2019 ACM Conference on Economics and Computation, 2019, pp. 807–808.
- G. K. Dziugaite, S. Ben-David, and D. M. Roy, “Enforcing interpretability and its statistical impacts: Trade-offs between accuracy and interpretability,” arXiv preprint arXiv:2010.13764, 2020.
- S. Jabbari, H.-C. Ou, H. Lakkaraju, and M. Tambe, “An empirical study of the trade-offs between interpretability and fairness,” in ICML 2020 Workshop on Human Interpretability in Machine Learning, 2020.
- C. Rudin, C. Chen, Z. Chen, H. Huang, L. Semenova, and C. Zhong, “Interpretable machine learning: Fundamental principles and 10 grand challenges,” Statistic Surveys, vol. 16, pp. 1–85, 2022.
- U. Aïvodji, J. Ferry, S. Gambs, M. Huguet, and M. Siala, “Leveraging integer linear programming to learn optimal fair rule lists,” in Integration of Constraint Programming, Artificial Intelligence, and Operations Research - 19th International Conference, CPAIOR 2022, Los Angeles, CA, USA, June 20-23, 2022, Proceedings, ser. Lecture Notes in Computer Science, P. Schaus, Ed., vol. 13292. Springer, 2022, pp. 103–119. [Online]. Available: https://doi.org/10.1007/978-3-031-08011-1\_9
- J. Dai, S. Upadhyay, S. H. Bach, and H. Lakkaraju, “What will it take to generate fairness-preserving explanations?” CoRR, vol. abs/2106.13346, 2021. [Online]. Available: https://arxiv.org/abs/2106.13346
- U. Aïvodji, H. Arai, O. Fortineau, S. Gambs, S. Hara, and A. Tapp, “Fairwashing: the risk of rationalization,” in Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. PMLR, 2019, pp. 161–170. [Online]. Available: http://proceedings.mlr.press/v97/aivodji19a.html
- M. M. Manerba and R. Guidotti, “Investigating debiasing effects on classification and explainability,” in AIES ’22: AAAI/ACM Conference on AI, Ethics, and Society, Oxford, United Kingdom, May 19 - 21, 2021, V. Conitzer, J. Tasioulas, M. Scheutz, R. Calo, M. Mara, and A. Zimmermann, Eds. ACM, 2022, pp. 468–478. [Online]. Available: https://doi.org/10.1145/3514094.3534170
- J. Dodge, Q. V. Liao, Y. Zhang, R. K. Bellamy, and C. Dugan, “Explaining models: an empirical study of how explanations impact fairness judgment,” in Proceedings of the 24th international conference on intelligent user interfaces, 2019, pp. 275–285.
- C. Wang, B. Han, B. Patel, and C. Rudin, “In pursuit of interpretable, fair and accurate machine learning for criminal recidivism prediction,” Journal of Quantitative Criminology, pp. 1–63, 2022.
- F. Kamiran and T. Calders, “Data preprocessing techniques for classification without discrimination,” Knowledge and Information Systems, vol. 33, no. 1, pp. 1–33, 2012.
- R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork, “Learning fair representations,” in International Conference on Machine Learning, 2013, pp. 325–333.
- G. Pleiss, M. Raghavan, F. Wu, J. Kleinberg, and K. Q. Weinberger, “On fairness and calibration,” Advances in neural information processing systems, vol. 30, 2017.
- R. Shokri, M. Strobel, and Y. Zick, “Exploiting transparency measures for membership inference: a cautionary tale,” in The AAAI Workshop on Privacy-Preserving Artificial Intelligence (PPAI). AAAI, vol. 13, 2020.
- ——, “On the privacy risks of model explanations,” in Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 2021, pp. 231–241.
- J. Dai, S. Upadhyay, U. Aïvodji, S. H. Bach, and H. Lakkaraju, “Fairness via explanation quality: Evaluating disparities in the quality of post hoc explanations,” in AIES ’22: AAAI/ACM Conference on AI, Ethics, and Society, Oxford, United Kingdom, May 19 - 21, 2021, V. Conitzer, J. Tasioulas, M. Scheutz, R. Calo, M. Mara, and A. Zimmermann, Eds. ACM, 2022, pp. 203–214. [Online]. Available: https://doi.org/10.1145/3514094.3534159
- A. Balagopalan, H. Zhang, K. Hamidieh, T. Hartvigsen, F. Rudzicz, and M. Ghassemi, “The road to explainability is paved with bias: Measuring the fairness of explanations,” in FAccT ’22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21 - 24, 2022. ACM, 2022, pp. 1194–1206. [Online]. Available: https://doi.org/10.1145/3531146.3533179
- B. Ustun, A. Spangher, and Y. Liu, “Actionable recourse in linear classification,” in Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, Atlanta, GA, USA, January 29-31, 2019, danah boyd and J. H. Morgenstern, Eds. ACM, 2019, pp. 10–19. [Online]. Available: https://doi.org/10.1145/3287560.3287566
- S. Sharma, J. Henderson, and J. Ghosh, “CERTIFAI: A common framework to provide explanations and analyse the fairness and robustness of black-box models,” in AIES ’20: AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, USA, February 7-8, 2020, A. N. Markham, J. Powles, T. Walsh, and A. L. Washington, Eds. ACM, 2020, pp. 166–172. [Online]. Available: https://doi.org/10.1145/3375627.3375812
- V. Gupta, P. Nokhiz, C. D. Roy, and S. Venkatasubramanian, “Equalizing recourse across groups,” CoRR, vol. abs/1909.03166, 2019. [Online]. Available: http://arxiv.org/abs/1909.03166
- A. Karimi, G. Barthe, B. Schölkopf, and I. Valera, “A survey of algorithmic recourse: Contrastive explanations and consequential recommendations,” ACM Comput. Surv., vol. 55, no. 5, pp. 95:1–95:29, 2023. [Online]. Available: https://doi.org/10.1145/3527848
- H. Lakkaraju and O. Bastani, “”how do i fool you?”: Manipulating user trust via misleading black box explanations,” in Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, ser. AIES ’20. New York, NY, USA: Association for Computing Machinery, 2020, p. 79–85. [Online]. Available: https://doi.org/10.1145/3375627.3375833
- H. Lakkaraju, E. Kamar, R. Caruana, and J. Leskovec, “Faithful and customizable explanations of black box models,” in Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, AIES 2019, Honolulu, HI, USA, January 27-28, 2019, V. Conitzer, G. K. Hadfield, and S. Vallor, Eds. ACM, 2019, pp. 131–138. [Online]. Available: https://doi.org/10.1145/3306618.3314229
- U. Aïvodji, H. Arai, S. Gambs, and S. Hara, “Characterizing the risk of fairwashing,” in Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan, Eds., 2021, pp. 14 822–14 834. [Online]. Available: https://proceedings.neurips.cc/paper/2021/hash/7caf5e22ea3eb8175ab518429c8589a4-Abstract.html
- D. Slack, S. Hilgard, E. Jia, S. Singh, and H. Lakkaraju, “Fooling lime and shap: Adversarial attacks on post hoc explanation methods,” in Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, ser. AIES ’20. New York, NY, USA: Association for Computing Machinery, 2020, p. 180–186. [Online]. Available: https://doi.org/10.1145/3375627.3375830
- D. Slack, S. Hilgard, S. Singh, and H. Lakkaraju, “Feature attributions and counterfactual explanations can be manipulated,” CoRR, vol. abs/2106.12563, 2021. [Online]. Available: https://arxiv.org/abs/2106.12563
- J. Heo, S. Joo, and T. Moon, “Fooling neural network interpretations via adversarial model manipulation,” in Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett, Eds., 2019, pp. 2921–2932. [Online]. Available: https://proceedings.neurips.cc/paper/2019/hash/7fea637fd6d02b8f0adf6f7dc36aed93-Abstract.html
- B. Dimanov, U. Bhatt, M. Jamnik, and A. Weller, “You shouldn’t trust me: Learning models which conceal unfairness from multiple explanation methods,” in ECAI 2020 - 24th European Conference on Artificial Intelligence, 29 August-8 September 2020, Santiago de Compostela, Spain, August 29 - September 8, 2020 - Including 10th Conference on Prestigious Applications of Artificial Intelligence (PAIS 2020), ser. Frontiers in Artificial Intelligence and Applications, G. D. Giacomo, A. Catalá, B. Dilkina, M. Milano, S. Barro, A. Bugarín, and J. Lang, Eds., vol. 325. IOS Press, 2020, pp. 2473–2480. [Online]. Available: https://doi.org/10.3233/FAIA200380
- D. Pruthi, M. Gupta, B. Dhingra, G. Neubig, and Z. C. Lipton, “Learning to deceive with attention-based explanations,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, D. Jurafsky, J. Chai, N. Schluter, and J. R. Tetreault, Eds. Association for Computational Linguistics, 2020, pp. 4782–4793. [Online]. Available: https://doi.org/10.18653/v1/2020.acl-main.432
- D. Slack, A. Hilgard, H. Lakkaraju, and S. Singh, “Counterfactual explanations can be manipulated,” in Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan, Eds., 2021, pp. 62–75. [Online]. Available: https://proceedings.neurips.cc/paper/2021/hash/009c434cab57de48a31f6b669e7ba266-Abstract.html
- X. Zhang, N. Wang, H. Shen, S. Ji, X. Luo, and T. Wang, “Interpretable deep learning under fire,” in 29th USENIX Security Symposium, USENIX Security 2020, August 12-14, 2020, S. Capkun and F. Roesner, Eds. USENIX Association, 2020, pp. 1659–1676. [Online]. Available: https://www.usenix.org/conference/usenixsecurity20/presentation/zhang-xinyang
- gabriel laberge, U. Aïvodji, S. Hara, M. Marchand, and F. Khomh, “Fooling SHAP with stealthily biased sampling,” in The Eleventh International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=J4mJjotSauh
- E. L. Merrer and G. Tredan, “The bouncer problem: Challenges to remote explainability,” arXiv preprint arXiv:1910.01432, 2019.
- C. Rudin, “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,” Nature Machine Intelligence, vol. 1, no. 5, pp. 206–215, 2019.
- F. Doshi-Velez and B. Kim, “Towards a rigorous science of interpretable machine learning,” arXiv preprint arXiv:1702.08608, 2017.
- T. Begley, T. Schwedes, C. Frye, and I. Feige, “Explainability for fair machine learning,” arXiv preprint arXiv:2010.07389, 2020.
- N. Kilbertus, A. Gascón, M. J. Kusner, M. Veale, K. P. Gummadi, and A. Weller, “Blind justice: Fairness with encrypted sensitive attributes,” in Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, ser. Proceedings of Machine Learning Research, J. G. Dy and A. Krause, Eds., vol. 80. PMLR, 2018, pp. 2635–2644. [Online]. Available: http://proceedings.mlr.press/v80/kilbertus18a.html
- R. Cummings, V. Gupta, D. Kimpara, and J. Morgenstern, “On the compatibility of privacy and fairness,” in Adjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization, ser. UMAP’19 Adjunct. New York, NY, USA: Association for Computing Machinery, 2019, p. 309–315. [Online]. Available: https://doi.org/10.1145/3314183.3323847
- S. Agarwal, “Trade-offs between fairness and privacy in machine learning,” in IJCAI 2021 Workshop on AI for Social Good, 2021.
- B. Kulynych, M. Yaghini, G. Cherubin, M. Veale, and C. Troncoso, “Disparate vulnerability to membership inference attacks,” Proc. Priv. Enhancing Technol., vol. 2022, no. 1, pp. 460–480, 2022. [Online]. Available: https://doi.org/10.2478/popets-2022-0023
- M. D. Ekstrand, R. Joshaghani, and H. Mehrpouyan, “Privacy for all: Ensuring fair and equitable privacy protections,” in Conference on Fairness, Accountability and Transparency. PMLR, 2018, pp. 35–47.
- H. Chang and R. Shokri, “On the privacy risks of algorithmic fairness,” in IEEE European Symposium on Security and Privacy, EuroS&P 2021, Vienna, Austria, September 6-10, 2021. IEEE, 2021, pp. 292–303. [Online]. Available: https://doi.org/10.1109/EuroSP51992.2021.00028
- H. Hu and C. Lan, “Inference attack and defense on the distributed private fair learning framework,” in The AAAI Workshop on Privacy-Preserving Artificial Intelligence, 2020.
- J. Ferry, U. Aïvodji, S. Gambs, M.-J. Huguet, and M. Siala, “Exploiting fairness to enhance sensitive attributes reconstruction,” in First IEEE Conference on Secure and Trustworthy Machine Learning, 2023. [Online]. Available: https://openreview.net/forum?id=tOVr0HLaFz0
- E. Bagdasaryan, O. Poursaeed, and V. Shmatikov, “Differential privacy has disparate impact on model accuracy,” in Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett, Eds., 2019, pp. 15 453–15 462.
- A. Uniyal, R. Naidu, S. Kotti, S. Singh, P. J. Kenfack, F. Mireshghallah, and A. Trask, “DP-SGD vs PATE: which has less disparate impact on model accuracy?” CoRR, vol. abs/2106.12576, 2021. [Online]. Available: https://arxiv.org/abs/2106.12576
- T. Farrand, F. Mireshghallah, S. Singh, and A. Trask, “Neither private nor fair: Impact of data imbalance on utility and fairness in differential privacy,” in PPMLP’20: Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice, Virtual Event, USA, November, 2020, B. Zhang, R. A. Popa, M. Zaharia, G. Gu, and S. Ji, Eds. ACM, 2020, pp. 15–19. [Online]. Available: https://doi.org/10.1145/3411501.3419419
- V. M. Suriyakumar, N. Papernot, A. Goldenberg, and M. Ghassemi, “Chasing your long tails: Differentially private prediction in health care settings,” in FAccT ’21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3-10, 2021, M. C. Elish, W. Isaac, and R. S. Zemel, Eds. ACM, 2021, pp. 723–734. [Online]. Available: https://doi.org/10.1145/3442188.3445934
- C. Tran, M. H. Dinh, K. Beiter, and F. Fioretto, “A fairness analysis on private aggregation of teacher ensembles,” CoRR, vol. abs/2109.08630, 2021. [Online]. Available: https://arxiv.org/abs/2109.08630
- D. Xu, W. Du, and X. Wu, “Removing disparate impact on model accuracy in differentially private stochastic gradient descent,” in KDD ’21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14-18, 2021, F. Zhu, B. C. Ooi, and C. Miao, Eds. ACM, 2021, pp. 1924–1932. [Online]. Available: https://doi.org/10.1145/3447548.3467268
- T. Zhang, T. Zhu, K. Gao, W. Zhou, and P. S. Yu, “Balancing learning model privacy, fairness, and accuracy with early stopping criteria,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–13, 2021.
- D. Pujol, R. McKenna, S. Kuppam, M. Hay, A. Machanavajjhala, and G. Miklau, “Fair decision making using privacy-protected data,” in FAT* 2020 - Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. Association for Computing Machinery, Inc, jan 2020, pp. 189–199.
- K. Koch and M. Soll, “No matter how you slice it: Machine unlearning with SISA comes at the expense of minority classes,” in First IEEE Conference on Secure and Trustworthy Machine Learning, 2023. [Online]. Available: https://openreview.net/forum?id=RBX1H-SGdT
- A. Datta, S. Sen, and Y. Zick, “Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems,” in IEEE Symposium on Security and Privacy, SP 2016, San Jose, CA, USA, May 22-26, 2016. IEEE Computer Society, 2016, pp. 598–617. [Online]. Available: https://doi.org/10.1109/SP.2016.42
- N. Patel, R. Shokri, and Y. Zick, “Model explanations with differential privacy,” in FAccT ’22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21 - 24, 2022. ACM, 2022, pp. 1895–1904. [Online]. Available: https://doi.org/10.1145/3531146.3533235
- D. Xu, S. Yuan, and X. Wu, “Achieving differential privacy and fairness in logistic regression,” in Companion of The 2019 World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019, S. Amer-Yahia, M. Mahdian, A. Goel, G. Houben, K. Lerman, J. J. McAuley, R. Baeza-Yates, and L. Zia, Eds. ACM, 2019, pp. 594–599. [Online]. Available: https://doi.org/10.1145/3308560.3317584
- J. Ding, X. Zhang, X. Li, J. Wang, R. Yu, and M. Pan, “Differentially private and fair classification via calibrated functional mechanism,” in The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, 2020, pp. 622–629. [Online]. Available: https://aaai.org/ojs/index.php/AAAI/article/view/5402
- C. Tran, F. Fioretto, and P. V. Hentenryck, “Differentially private and fair deep learning: A lagrangian dual approach,” in Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. AAAI Press, 2021, pp. 9932–9939. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/17193
- M. Jagielski, M. J. Kearns, J. Mao, A. Oprea, A. Roth, S. Sharifi-Malvajerdi, and J. R. Ullman, “Differentially private fair learning,” in Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. PMLR, 2019, pp. 3000–3008. [Online]. Available: http://proceedings.mlr.press/v97/jagielski19a.html
- M. Hardt, E. Price, E. Price, and N. Srebro, “Equality of opportunity in supervised learning,” in Advances in Neural Information Processing Systems, vol. 29. Curran Associates, Inc., 2016. [Online]. Available: https://proceedings.neurips.cc/paper/2016/file/9d2682367c3935defcb1f9e247a97c0d-Paper.pdf
- A. Agarwal, A. Beygelzimer, M. Dudík, J. Langford, and H. M. Wallach, “A reductions approach to fair classification,” in Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, ser. Proceedings of Machine Learning Research, J. G. Dy and A. Krause, Eds., vol. 80. PMLR, 2018, pp. 60–69. [Online]. Available: http://proceedings.mlr.press/v80/agarwal18a.html
- H. Mozannar, M. Ohannessian, and N. Srebro, “Fair learning with private demographic data,” in International Conference on Machine Learning. PMLR, 2020, pp. 7066–7075.
- P. Mangold, M. Perrot, A. Bellet, and M. Tommasi, “Differential privacy has bounded impact on fairness in classification,” in International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, ser. Proceedings of Machine Learning Research, A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, Eds., vol. 202. PMLR, 2023, pp. 23 681–23 705. [Online]. Available: https://proceedings.mlr.press/v202/mangold23a.html
- A. Ignatiev, M. C. Cooper, M. Siala, E. Hebrard, and J. Marques-Silva, “Towards formal fairness in machine learning,” in International Conference on Principles and Practice of Constraint Programming. Springer, 2020, pp. 846–867.
- M. M. Khalili, X. Zhang, M. Abroshan, and S. Sojoudi, “Improving fairness and privacy in selection problems,” in Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. AAAI Press, 2021, pp. 8092–8100. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/16986
- S. Ruggieri, “Data anonymity meets non-discrimination,” in 2013 IEEE 13th International Conference on Data Mining Workshops. IEEE, 2013, pp. 875–882.
- R. Friedberg and R. Rogers, “Privacy aware experimentation over sensitive groups: A general chi square approach,” in Workshop on Algorithmic Fairness through the Lens of Causality and Privacy. PMLR, 2023, pp. 23–66.
- S. Hajian, J. Domingo-Ferrer, A. Monreale, D. Pedreschi, and F. Giannotti, “Discrimination- and privacy-aware patterns,” Data Min. Knowl. Discov., vol. 29, no. 6, pp. 1733–1782, 2015. [Online]. Available: https://doi.org/10.1007/s10618-014-0393-7
- D. Banisar, “The right to information and privacy: balancing rights and managing conflicts,” World Bank Institute Governance Working Paper, 2011.
- G. Severi, J. Meyer, S. E. Coull, and A. Oprea, “Explanation-guided backdoor poisoning attacks against malware classifiers,” in 30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021, M. Bailey and R. Greenstadt, Eds. USENIX Association, 2021, pp. 1487–1504. [Online]. Available: https://www.usenix.org/conference/usenixsecurity21/presentation/severi
- W. Garcia, J. I. Choi, S. K. Adari, S. Jha, and K. R. B. Butler, “Explainable black-box attacks against model-based authentication,” CoRR, vol. abs/1810.00024, 2018. [Online]. Available: http://arxiv.org/abs/1810.00024
- A. Kuppa and N. Le-Khac, “Adversarial XAI methods in cybersecurity,” IEEE Trans. Inf. Forensics Secur., vol. 16, pp. 4924–4938, 2021. [Online]. Available: https://doi.org/10.1109/TIFS.2021.3117075
- S. Milli, L. Schmidt, A. D. Dragan, and M. Hardt, “Model reconstruction from model explanations,” in Proceedings of the Conference on Fairness, Accountability, and Transparency, 2019, pp. 1–9.
- T. Miura, S. Hasegawa, and T. Shibahara, “MEGEX: data-free model extraction attack against gradient-based explainable AI,” CoRR, vol. abs/2107.08909, 2021. [Online]. Available: https://arxiv.org/abs/2107.08909
- U. Aïvodji, A. Bolot, and S. Gambs, “Model extraction from counterfactual explanations,” CoRR, vol. abs/2009.01884, 2020. [Online]. Available: https://arxiv.org/abs/2009.01884
- Y. Wang, H. Qian, and C. Miao, “Dualcf: Efficient model extraction attack from counterfactual explanations,” in FAccT ’22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21 - 24, 2022. ACM, 2022, pp. 1318–1329. [Online]. Available: https://doi.org/10.1145/3531146.3533188
- M. T. Ribeiro, S. Singh, and C. Guestrin, “” why should i trust you?” explaining the predictions of any classifier,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp. 1135–1144.
- D. Smilkov, N. Thorat, B. Kim, F. Viégas, and M. Wattenberg, “Smoothgrad: removing noise by adding noise,” arXiv preprint arXiv:1706.03825, 2017.
- I. E. Kumar, S. Venkatasubramanian, C. Scheidegger, and S. Friedler, “Problems with shapley-value-based explanations as feature importance measures,” in International Conference on Machine Learning. PMLR, 2020, pp. 5491–5500.
- X. Zhao, W. Zhang, X. Xiao, and B. Lim, “Exploiting explanations for model inversion attacks,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 682–692.
- K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” in 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Workshop Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2014. [Online]. Available: http://arxiv.org/abs/1312.6034
- H. G. Ramaswamy et al., “Ablation-cam: Visual explanations for deep convolutional network via gradient-free localization,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020, pp. 983–991.
- S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and W. Samek, “On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation,” PloS one, vol. 10, no. 7, p. e0130140, 2015.
- X. Luo, Y. Jiang, and X. Xiao, “Feature inference attack on shapley values,” in Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS 2022, Los Angeles, CA, USA, November 7-11, 2022, H. Yin, A. Stavrou, C. Cremers, and E. Shi, Eds. ACM, 2022, pp. 2233–2247. [Online]. Available: https://doi.org/10.1145/3548606.3560573
- V. Duddu and A. Boutet, “Inferring sensitive attributes from model explanations,” in Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, October 17-21, 2022, M. A. Hasan and L. Xiong, Eds. ACM, 2022, pp. 416–425. [Online]. Available: https://doi.org/10.1145/3511808.3557362
- S. Gambs, A. Gmati, and M. Hurfin, “Reconstruction attack through classifier analysis,” in Data and Applications Security and Privacy XXVI - 26th Annual IFIP WG 11.3 Conference, DBSec 2012, Paris, France, July 11-13,2012. Proceedings, ser. Lecture Notes in Computer Science, N. Cuppens-Boulahia, F. Cuppens, and J. García-Alfaro, Eds., vol. 7371. Springer, 2012, pp. 274–281. [Online]. Available: https://doi.org/10.1007/978-3-642-31540-4\_21
- J. Ferry, U. Aïvodji, S. Gambs, M. Huguet, and M. Siala, “Probabilistic dataset reconstruction from interpretable models,” CoRR, vol. abs/2308.15099, 2023. [Online]. Available: http://arxiv.org/abs/2308.15099
- G. Srivastava, R. H. Jhaveri, S. Bhattacharya, S. Pandya, Rajeswari, P. K. R. Maddikunta, G. Yenduri, J. G. Hall, M. Alazab, and T. R. Gadekallu, “XAI for cybersecurity: State of the art, challenges, open issues and future directions,” CoRR, vol. abs/2206.03585, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2206.03585
- A. Friedman and A. Schuster, “Data mining with differential privacy,” in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, July 25-28, 2010, B. Rao, B. Krishnapuram, A. Tomkins, and Q. Yang, Eds. ACM, 2010, pp. 493–502. [Online]. Available: https://doi.org/10.1145/1835804.1835868
- S. Fletcher and M. Z. Islam, “Decision tree classification with differential privacy: A survey,” ACM Comput. Surv., vol. 52, no. 4, pp. 83:1–83:33, 2019. [Online]. Available: https://doi.org/10.1145/3337064
- F. Harder, M. Bauer, and M. Park, “Interpretable and differentially private predictions,” in The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, 2020, pp. 4083–4090. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/5827
- R. Naidu, A. Priyanshu, A. Kumar, S. Kotti, H. Wang, and F. Mireshghallah, “When differential privacy meets interpretability: A case study,” CoRR, vol. abs/2106.13203, 2021. [Online]. Available: https://arxiv.org/abs/2106.13203
- T. D. T. Nguyen, P. Lai, H. Phan, and M. T. Thai, “Xrand: Differentially private defense against explanation-guided attacks,” in Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, B. Williams, Y. Chen, and J. Neville, Eds. AAAI Press, 2023, pp. 11 873–11 881. [Online]. Available: https://doi.org/10.1609/aaai.v37i10.26401
- R. Mochaourab, S. Sinha, S. Greenstein, and P. Papapetrou, “Robust explanations for private support vector machines,” CoRR, vol. abs/2102.03785, 2021. [Online]. Available: https://arxiv.org/abs/2102.03785
- Z. Li, H. Chen, Z. Ni, and H. Shao, “Balancing privacy protection and interpretability in federated learning,” CoRR, vol. abs/2302.08044, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2302.08044
- F. Yang, Q. Feng, K. Zhou, J. Chen, and X. Hu, “Differentially private counterfactuals via functional mechanism,” CoRR, vol. abs/2208.02878, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2208.02878