Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Model Reconstruction Using Counterfactual Explanations: A Perspective From Polytope Theory (2405.05369v2)

Published 8 May 2024 in cs.LG, cs.CR, stat.ML, cs.CY, cs.IT, and math.IT

Abstract: Counterfactual explanations provide ways of achieving a favorable model outcome with minimum input perturbation. However, counterfactual explanations can also be leveraged to reconstruct the model by strategically training a surrogate model to give similar predictions as the original (target) model. In this work, we analyze how model reconstruction using counterfactuals can be improved by further leveraging the fact that the counterfactuals also lie quite close to the decision boundary. Our main contribution is to derive novel theoretical relationships between the error in model reconstruction and the number of counterfactual queries required using polytope theory. Our theoretical analysis leads us to propose a strategy for model reconstruction that we call Counterfactual Clamping Attack (CCA) which trains a surrogate model using a unique loss function that treats counterfactuals differently than ordinary instances. Our approach also alleviates the related problem of decision boundary shift that arises in existing model reconstruction approaches when counterfactuals are treated as ordinary instances. Experimental results demonstrate that our strategy improves fidelity between the target and surrogate model predictions on several datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Model extraction from counterfactual explanations. arXiv preprint arXiv:2009.01884.
  2. Aleksandrov, A. D. 1967. A. D. Alexandrov: Selected Works Part II: Intrinsic Geometry of Convex Surfaces. American Mathematical Society.
  3. Input Convex Neural Networks. In Proceedings of the 34th International Conference on Machine Learning, 146–155. PMLR. ISSN: 2640-3498.
  4. Machine Bias. ProPublica.
  5. Polytope Approximation and the Mahler Volume. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, 29–42. Society for Industrial and Applied Mathematics.
  6. The hidden assumptions behind counterfactual explanations and principal reasons. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 80–89.
  7. Spectrally-normalized margin bounds for neural networks. In Advances in Neural Information Processing Systems, volume 30.
  8. Adult. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5XW20.
  9. Consistent Counterfactuals for Deep Models. In International Conference on Learning Representations.
  10. Approximation of smooth convex bodies by random circumscribed polytopes. The Annals of Applied Probability, 14(1): 239–273.
  11. Constraints-based explanations of classifications. In IEEE 35th International Conference on Data Engineering, 530–541.
  12. Explanations based on the missing: Towards contrastive explanations with pertinent negatives. In Advances in Neural Information Processing Systems, volume 31.
  13. FICO. 2018. Explainable Machine Learning Challenge.
  14. The privacy issue of counterfactual explanations: Explanation linkage attacks. ACM Transactions on Intelligent Systems and Technology, 14(5): 1–24.
  15. InverseNet: Augmenting Model Extraction Attacks with Training Data Inversion. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2439–2447.
  16. Model extraction attacks and defenses on cloud-based machine learning models. IEEE Communications Magazine, 58(12): 83–89.
  17. Regularisation of neural networks by enforcing lipschitz continuity. Machine Learning, 110: 393–416.
  18. Guidotti, R. 2022. Counterfactual explanations and how to find them: Literature review and benchmarking. Data Mining and Knowledge Discovery, 1–55.
  19. Robust Counterfactual Explanations for Neural Networks With Probabilistic Guarantees. In 40th International Conference on Machine Learning.
  20. High accuracy and high fidelity extraction of neural networks. In Proceedings of the 29th USENIX Conference on Security Symposium, 1345–1362.
  21. PRADA: Protecting against DNN model stealing attacks. In 2019 IEEE European Symposium on Security and Privacy, 512–527.
  22. Model-agnostic counterfactual explanations for consequential decisions. In International Conference on Artificial Intelligence and Statistics, 895–905.
  23. A survey of algorithmic recourse: Contrastive explanations and consequential recommendations. ACM Computing Surveys, 55(5): 1–29.
  24. Inverse classification for comparison-based interpretability in machine learning. arXiv preprint arXiv:1712.08443.
  25. Lee, J. M. 2009. Manifolds and Differential Geometry. Graduate Studies in Mathematics. American Mathematical Society.
  26. Certified Monotonic Neural Networks. In Advances in Neural Information Processing Systems, volume 33, 15427–15438.
  27. Explanations for Monotonic Classifiers. In 38th International Conference on Machine Learning.
  28. Model reconstruction from model explanations. In Proceedings of the Conference on Fairness, Accountability, and Transparency, 1–9.
  29. A Survey on the Robustness of Feature Importance and Counterfactual Explanations. arXiv e-prints, arXiv:2111.00358.
  30. Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency.
  31. Learning classification with auxiliary probabilistic information. IEEE International Conference on Data Mining, 2011: 477–486.
  32. Sample-efficient learning with auxiliary class-label information. AMIA Annual Symposium Proceedings, 2011: 1004–1012.
  33. I Know What You Trained Last Summer: A Survey on Stealing Machine Learning Models and Defences. ACM Computing Surveys, 55(14s).
  34. Activethief: Model extraction using active learning and unannotated public data. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 865–872.
  35. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, 506–519.
  36. Training robust neural networks using Lipschitz bounds. IEEE Control Systems Letters, 6: 121–126.
  37. On the Privacy Risks of Algorithmic Recourse. In Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, volume 206, 9680–9696.
  38. On the privacy risks of model explanations. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 231–241.
  39. Stealing machine learning models via prediction APIs. In 25th USENIX Security Symposium, 601–618.
  40. Towards robust and reliable algorithmic recourse. Advances in Neural Information Processing Systems, 34: 16926–16937.
  41. Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A Review. arXiv preprint arXiv:2010.10596.
  42. Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR. Harvard Journal of Law and Technology, 31: 841.
  43. DualCF: Efficient Model Extraction Attack from Counterfactual Explanations. In 2022 ACM Conference on Fairness, Accountability, and Transparency, 1318–1329.
  44. XAudit : A Theoretical Look at Auditing with Explanations. arXiv:2206.04740.
  45. Yeh, I.-C. 2016. Default of Credit Card Clients. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C55S3H.
Citations (3)

Summary

We haven't generated a summary for this paper yet.