Trained Random Forests Completely Reveal your Dataset (2402.19232v2)
Abstract: We introduce an optimization-based reconstruction attack capable of completely or near-completely reconstructing a dataset utilized for training a random forest. Notably, our approach relies solely on information readily available in commonly used libraries such as scikit-learn. To achieve this, we formulate the reconstruction problem as a combinatorial problem under a maximum likelihood objective. We demonstrate that this problem is NP-hard, though solvable at scale using constraint programming -- an approach rooted in constraint propagation and solution-domain reduction. Through an extensive computational investigation, we demonstrate that random forests trained without bootstrap aggregation but with feature randomization are susceptible to a complete reconstruction. This holds true even with a small number of trees. Even with bootstrap aggregation, the majority of the data can also be reconstructed. These findings underscore a critical vulnerability inherent in widely adopted ensemble methods, warranting attention and mitigation. Although the potential for such reconstruction attacks has been discussed in privacy research, our study provides clear empirical evidence of their practicability.
- Dikaios: Privacy auditing of algorithmic fairness via attribute inference attacks. arXiv preprint arXiv:2202.02242, 2022.
- Machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks. propublica (2016). ProPublica, May, 23, 2016.
- Optimal kidney exchange with immunosuppressants. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pp. 21–29. AAAI Press, 2021. doi: 10.1609/AAAI.V35I1.16073. URL https://doi.org/10.1609/aaai.v35i1.16073.
- Membership inference attacks from first principles. In 43rd IEEE Symposium on Security and Privacy, SP 2022, San Francisco, CA, USA, May 22-26, 2022, pp. 1897–1914. IEEE, 2022. doi: 10.1109/SP46214.2022.9833649. URL https://doi.org/10.1109/SP46214.2022.9833649.
- Cristofaro, E. D. An overview of privacy in machine learning. CoRR, abs/2005.08679, 2020. URL https://arxiv.org/abs/2005.08679.
- Revealing information while preserving privacy. In Neven, F., Beeri, C., and Milo, T. (eds.), Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 9-12, 2003, San Diego, CA, USA, pp. 202–210. ACM, 2003. doi: 10.1145/773153.773173. URL https://doi.org/10.1145/773153.773173.
- UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
- The price of privacy and the limits of lp decoding. In Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, STOC ’07, pp. 85–94, New York, NY, USA, 2007. Association for Computing Machinery. ISBN 9781595936318. doi: 10.1145/1250790.1250804. URL https://doi.org/10.1145/1250790.1250804.
- The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
- Exposed! a survey of attacks on private data. Annual Review of Statistics and Its Application, 4(1):61–84, 2017. doi: 10.1146/annurev-statistics-060116-054123. URL https://doi.org/10.1146/annurev-statistics-060116-054123.
- Exploiting fairness to enhance sensitive attributes reconstruction. In First IEEE Conference on Secure and Trustworthy Machine Learning, 2023. URL https://openreview.net/forum?id=tOVr0HLaFz0.
- Probabilistic Dataset Reconstruction from Interpretable Models. In 2nd IEEE Conference on Secure and Trustworthy Machine Learning, Toronto, Canada, April 2024. URL https://hal.science/hal-04189566.
- Decision tree classification with differential privacy: A survey. ACM Comput. Surv., 52(4), aug 2019. ISSN 0360-0300. doi: 10.1145/3337064. URL https://doi.org/10.1145/3337064.
- Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing. In Fu, K. and Jung, J. (eds.), Proceedings of the 23rd USENIX Security Symposium, San Diego, CA, USA, August 20-22, 2014, pp. 17–32. USENIX Association, 2014. URL https://www.usenix.org/conference/usenixsecurity14/technical-sessions/presentation/fredrikson_matthew.
- Model inversion attacks that exploit confidence information and basic countermeasures. In Ray, I., Li, N., and Kruegel, C. (eds.), Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, October 12-16, 2015, pp. 1322–1333. ACM, 2015. doi: 10.1145/2810103.2813677. URL https://doi.org/10.1145/2810103.2813677.
- Reconstruction attack through classifier analysis. In Cuppens-Boulahia, N., Cuppens, F., and GarcÃa-Alfaro, J. (eds.), Data and Applications Security and Privacy XXVI - 26th Annual IFIP WG 11.3 Conference, DBSec 2012, Paris, France, July 11-13,2012. Proceedings, volume 7371 of Lecture Notes in Computer Science, pp. 274–281. Springer, 2012. doi: 10.1007/978-3-642-31540-4_21. URL https://doi.org/10.1007/978-3-642-31540-4_21.
- Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2023. URL https://www.gurobi.com.
- Can querying for bias leak protected attributes? achieving privacy with smooth sensitivity. In NeurIPS 2022 Workshop on Algorithmic Fairness through the Lens of Causality and Privacy, 2022.
- Inference attack and defense on the distributed private fair learning framework. In The AAAI Workshop on Privacy-Preserving Artificial Intelligence, 2020.
- When machine learning meets privacy: A survey and outlook. ACM Computing Surveys (CSUR), 54(2):1–36, 2021a.
- On the intrinsic differential privacy of bagging. In Zhou, Z. (ed.), Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021, pp. 2730–2736. ijcai.org, 2021b. doi: 10.24963/IJCAI.2021/376. URL https://doi.org/10.24963/ijcai.2021/376.
- Optimal counterfactual explanations in tree ensembles. In International Conference on Machine Learning, pp. 8422–8431. PMLR, 2021.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- CP-SAT. URL https://developers.google.com/optimization/cp/cp_solver/.
- Privacy-preserving deep learning: Revisited and enhanced. In Batten, L., Kim, D. S., Zhang, X., and Li, G. (eds.), Applications and Techniques in Information Security - 8th International Conference, ATIS 2017, Auckland, New Zealand, July 6-7, 2017, Proceedings, volume 719 of Communications in Computer and Information Science, pp. 100–110. Springer, 2017. doi: 10.1007/978-981-10-5421-1_9. URL https://doi.org/10.1007/978-981-10-5421-1_9.
- A survey of privacy attacks in machine learning. CoRR, abs/2007.07646, 2020. URL https://arxiv.org/abs/2007.07646.
- Constraint programming. Foundations of Artificial Intelligence, 3:181–211, 2008.
- Updates-leak: Data set inference and reconstruction attacks in online learning. In Capkun, S. and Roesner, F. (eds.), 29th USENIX Security Symposium, USENIX Security 2020, August 12-14, 2020, pp. 1291–1308. USENIX Association, 2020. URL https://www.usenix.org/conference/usenixsecurity20/presentation/salem.
- Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017, pp. 3–18. IEEE Computer Society, 2017. doi: 10.1109/SP.2017.41. URL https://doi.org/10.1109/SP.2017.41.
- SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020. doi: 10.1038/s41592-019-0686-2.
- The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2, Part 1):2473–2480, 2009. ISSN 0957-4174. doi: 10.1016/j.eswa.2007.12.020.
- Zhou, Z.-H. Ensemble Methods: Foundations and Algorithms. Chapman & Hall/CRC, 1st edition, 2012. ISBN 1439830037.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.