Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

REFRESH: Responsible and Efficient Feature Reselection Guided by SHAP Values (2403.08880v1)

Published 13 Mar 2024 in cs.LG

Abstract: Feature selection is a crucial step in building machine learning models. This process is often achieved with accuracy as an objective, and can be cumbersome and computationally expensive for large-scale datasets. Several additional model performance characteristics such as fairness and robustness are of importance for model development. As regulations are driving the need for more trustworthy models, deployed models need to be corrected for model characteristics associated with responsible artificial intelligence. When feature selection is done with respect to one model performance characteristic (eg. accuracy), feature selection with secondary model performance characteristics (eg. fairness and robustness) as objectives would require going through the computationally expensive selection process from scratch. In this paper, we introduce the problem of feature \emph{reselection}, so that features can be selected with respect to secondary model performance characteristics efficiently even after a feature selection process has been done with respect to a primary objective. To address this problem, we propose REFRESH, a method to reselect features so that additional constraints that are desirable towards model performance can be achieved without having to train several new models. REFRESH's underlying algorithm is a novel technique using SHAP values and correlation analysis that can approximate for the predictions of a model without having to train these models. Empirical evaluations on three datasets, including a large-scale loan defaulting dataset show that REFRESH can help find alternate models with better model characteristics efficiently. We also discuss the need for reselection and REFRESH based on regulation desiderata.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Explaining Individual Predictions When Features Are Dependent: More Accurate Approximations to Shapley Values. Artificial Intelligence 298 (2021), 103502. https://doi.org/10.1016/j.artint.2021.103502
  2. Interpretable machine learning in healthcare. In Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics. 559–560.
  3. Machine learning algorithms and police decision-making: legal, ethical and regulatory challenges. (2018).
  4. On fairness in budget-constrained decision making. In KDD Workshop of Explainable Artificial Intelligence.
  5. Explainability for fair machine learning. arXiv preprint arXiv:2010.07389 (2020).
  6. AI Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM Journal of Research and Development 63, 4/5 (2019), 4–1.
  7. Explainable machine learning in deployment. In Proceedings of the 2020 conference on fairness, accountability, and transparency. 648–657.
  8. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment 2008, 10 (2008), P10008.
  9. Recent advances and emerging challenges of feature selection in the context of big data. Knowledge-based systems 86 (2015), 33–45.
  10. Benchmark for Filter Methods for Feature Selection in High-Dimensional Classification Data. Computational Statistics & Data Analysis 143 (2020), 106839. https://doi.org/10.1016/j.csda.2019.106839
  11. Denise Carter. 2020. Regulation and ethics in artificial intelligence and machine learning technologies: Where are we now? Who is responsible? Can the information professional play a role? Business Information Review 37, 2 (2020), 60–68.
  12. Hongyan Chang and Reza Shokri. 2021. On the privacy risks of algorithmic fairness. In 2021 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 292–303.
  13. True to the Model or True to the Data?. In ICML Workshop on Human Interpretability. arXiv:2006.16234
  14. Feature selection based on the shapley value. other words 1 (2005), 98Eqr.
  15. Virginia Dignum. 2019. Responsible artificial intelligence: how to develop and use AI in a responsible way. Springer Nature.
  16. Machine learning in Finance. Vol. 1406. Springer.
  17. A multi-objective multi-label feature selection algorithm based on shapley value. Entropy 23, 8 (2021), 1094.
  18. Feature Selection Under Fairness and Performance Constraints. In International Conference on Big Data Analytics and Knowledge Discovery. Springer, 125–130.
  19. Quantifying Feature Contributions to Overall Disparity Using Information Theory. arXiv preprint arXiv:2206.08454 (2022).
  20. Fairness under feature exemptions: Counterfactual and observational measures. IEEE Transactions on Information Theory 67, 10 (2021), 6675–6710.
  21. Is there a trade-off between fairness and accuracy? a perspective using mismatched hypothesis testing. In International Conference on Machine Learning. PMLR, 2803–2813.
  22. Differential Privacy and Fairness in Decisions and Learning Tasks: A Survey. arXiv preprint arXiv:2202.08187 (2022).
  23. Asymmetric Shapley Values: Incorporating Causal Knowledge into Model-Agnostic Explainability. In Advances in Neural Information Processing Systems, Vol. 33. 1229–1239.
  24. Shapley Values for Feature Selection: The Good, the Bad, and the Axioms. IEEE Access 9 (2021), 144352–144360. https://doi.org/10.1109/ACCESS.2021.3119110
  25. Causal Feature Selection for Algorithmic Fairness. In SIGMOD ’22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, Zachary Ives, Angela Bonifati, and Amr El Abbadi (Eds.). ACM, 276–285. https://doi.org/10.1145/3514221.3517909
  26. The case for process fairness in learning: Feature selection for fair decision making. In NIPS symposium on machine learning and the law, Vol. 1. Barcelona, Spain, 2.
  27. Beyond distributive fairness in algorithmic decision making: Feature selection for procedurally fair learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
  28. A comprehensive evaluation framework for deep model robustness. Pattern Recognition (2023), 109308.
  29. Causal Shapley Values: Exploiting Causal Knowledge to Explain Individual Predictions of Complex Models. In Advances in Neural Information Processing Systems, Vol. 33. 4778–4789.
  30. HMDA. 2016. HMDA dataset. https://www.consumerfinance.gov/data-research/hmda/historic-data (2016).
  31. HomeCredit. 2017. Home Credit Default Risk. https://www.kaggle.com/c/home-credit-default-risk (2017).
  32. Praveen Kotha. 2019. HomeCREDITproj. https://medium.com/@praveenkotha/1871f52e3ef2 (2019).
  33. Shapley Residuals: Quantifying the Limits of the Shapley Value for Explanations. In Advances in Neural Information Processing Systems, Vol. 34. 26598–26608.
  34. Vipin Kumar and Sonajharia Minz. 2014. Feature selection: a literature review. SmartCR 4, 3 (2014), 211–229.
  35. Feature selection: A data perspective. ACM computing surveys (CSUR) 50, 6 (2017), 1–45.
  36. Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems 30 (2017).
  37. Mel MacMahon and Diego Garlaschelli. 2013. Community detection for correlation matrices. arXiv preprint arXiv:1311.1924 (2013).
  38. Toward trustworthy and responsible artificial intelligence policy development. IEEE Intelligent Systems 35, 5 (2020), 103–108.
  39. Wilson E Marcílio and Danilo M Eler. 2020. From explanations to feature selection: assessing shap values as feature selection mechanism. In 2020 33rd SIBGRAPI conference on Graphics, Patterns and Images (SIBGRAPI). Ieee, 340–347.
  40. Jianyu Miao and Lingfeng Niu. 2016. A survey on feature selection. Procedia Computer Science 91 (2016), 919–926.
  41. Bias and Unfairness in Machine Learning Models: A Systematic Review on Datasets, Tools, Fairness Metrics, and Identification and Mitigation Methods. Big Data and Cognitive Computing 7, 1 (2023), 15.
  42. Manipulating and measuring model interpretability. In Proceedings of the 2021 CHI conference on human factors in computing systems. 1–52.
  43. ProPublica. 2016. ProPublica COMPAS. https://www.propublica.org/datastore/dataset/compas-recidivism-risk-score-data-and-analysis (2016).
  44. Fast Feature Selection with Fairness Constraints. arXiv preprint arXiv:2202.13718 (2022).
  45. Principles to practices for responsible AI: closing the gap. arXiv preprint arXiv:2006.04707 (2020).
  46. Lloyd Stowell Shapley. 1951. Notes on the N-Person Game-II: The Value of an n-Person Game. Project Rand, U.S. Air Force (1951).
  47. FaiR-N: Fair and Robust Neural Networks for Structured Data. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 946–955.
  48. FEAMOE: Fair, Explainable and Adaptive Mixture of Experts. arXiv preprint arXiv:2210.04995 (2022).
  49. Liwei Song and Prateek Mittal. 2020. Systematic evaluation of privacy risks of machine learning models. arXiv preprint arXiv:2003.10595 (2020).
  50. Jann Spiess. 2022. Machine Learning Explainability & Fairness: Insights from Consumer Lending. (2022).
  51. Michael Veale and Reuben Binns. 2017. Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data. Big Data & Society 4, 2 (2017), 2053951717743530.
  52. Alice Xiang and Inioluwa Deborah Raji. 2019. On the legal compatibility of fairness definitions. arXiv preprint arXiv:1912.00761 (2019).
  53. CIFS: Improving adversarial robustness of cnns via channel-wise importance-based feature selection. In International Conference on Machine Learning. PMLR, 11693–11703.
  54. Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning. PMLR, 7472–7482.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets