ATTAXONOMY: Unpacking Differential Privacy Guarantees Against Practical Adversaries (2405.01716v1)
Abstract: Differential Privacy (DP) is a mathematical framework that is increasingly deployed to mitigate privacy risks associated with machine learning and statistical analyses. Despite the growing adoption of DP, its technical privacy parameters do not lend themselves to an intelligible description of the real-world privacy risks associated with that deployment: the guarantee that most naturally follows from the DP definition is protection against membership inference by an adversary who knows all but one data record and has unlimited auxiliary knowledge. In many settings, this adversary is far too strong to inform how to set real-world privacy parameters. One approach for contextualizing privacy parameters is via defining and measuring the success of technical attacks, but doing so requires a systematic categorization of the relevant attack space. In this work, we offer a detailed taxonomy of attacks, showing the various dimensions of attacks and highlighting that many real-world settings have been understudied. Our taxonomy provides a roadmap for analyzing real-world deployments and developing theoretical bounds for more informative privacy attacks. We operationalize our taxonomy by using it to analyze a real-world case study, the Israeli Ministry of Health's recent release of a birth dataset using DP, showing how the taxonomy enables fine-grained threat modeling and provides insight towards making informed privacy parameter choices. Finally, we leverage the taxonomy towards defining a more realistic attack than previously considered in the literature, namely a distributional reconstruction attack: we generalize Balle et al.'s notion of reconstruction robustness to a less-informed adversary with distributional uncertainty, and extend the worst-case guarantees of DP to this average-case setting.
- J. M. Abowd. The U.S. Census Bureau adopts differential privacy. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ‘18, page 2867, 2018.
- The 2010 Census confidentiality protections failed, here’s how and why. Technical report, National Bureau of Economic Research, 2023.
- Reconstructing training data with informed adversaries. In 43rd IEEE Symposium on Security and Privacy, S&P ‘22, pages 1138–1156, 2022.
- On statistical disclosure control technologies for protecting personal data in tabular data sets. Cahiers 2020-17, WODC (Research and Data Centre), Dutch Ministry of Justice and Security, 2020. URL http://hdl.handle.net/20.500.12832/255.
- Safely expanding research access to administrative tax data: creating a synthetic public use file and a validation server. Technical report, Internal Revenue Service (IRS), 2019.
- Privacy harms. Boston University Law Review, 102:793–863, 2022.
- A. Cohen and K. Nissim. Towards formalizing the GDPR’s notion of singling out. Proceedings of the National Academy of Sciences, 117(15):8344–8352, 2020.
- D. Desfontaines. A list of real-world uses of differential privacy - Ted is writing things — desfontain.es. https://desfontain.es/privacy/real-world-differential-privacy.html, 2021.
- D. Desfontaines and B. Pejó. SoK: Differential privacies. Proceedings of Privacy Enhancing Technologies, 2020(2):288–313, 2020.
- Confidence-ranked reconstruction of census microdata from published statistics. Proceedings of the National Academy of Sciences, 120(8):e2218605120, 2023.
- I. Dinur and K. Nissim. Revealing information while preserving privacy. In F. Neven, C. Beeri, and T. Milo, editors, Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS ‘03, pages 202–210, 2003.
- Statistical Confidentiality: Principles and Practice. Statistics for Social and Behavioral Sciences. Springer New York, NY, 2011.
- Calibrating noise to sensitivity in private data analysis. In Proceedings of the Theory of Cryptography Conference, TCC ‘06, pages 265–284, 2006.
- Exposed! a survey of attacks on private data. Annual Review of Statistics and Its Application, 4:61–84, 2017.
- J. Fitzpatrick and K. DeSalvo. Helping public health officials combat covid-19, Apr 2020. URL https://blog.google/technology/health/covid-19-community-mobility-reports/.
- Bounding training data reconstruction in private (deep) learning. In Proceedings of the International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 8056–8071, 2022.
- Bounding training data reconstruction in DP-SGD. In Advances in Neural Information Processing Systems, NeurIPS ‘23, 2023.
- S. Hod and R. Canetti. Differentially private release of israel’s national registry of live births, 2024. arXiv pre-print 2405.00267.
- Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS genetics, 4(8):e1000167, 2008.
- Joint Task Force Transformation Initiative. Guide for Conducting Risk Assessments. Technical Report 800-30 Rev. 1, National Institute of Standards and Technology (NIST), September 2012. URL https://doi.org/10.6028/NIST.SP.800-30r1.
- Bounding data reconstruction attacks with the hypothesis testing interpretation of differential privacy, 2023. arXiv pre-print 2307.03928.
- G. Miklau. Negotiating privacy/utility trade-offs under differential privacy. USENIX Conference on Privacy Engineering Practice and Respect, 2022.
- Ministry of Health - Goverment of Israel. Israel’s national registry of live births, Feburary 2024. Official Hebrew repository: https://data.gov.il/dataset/birth-data. Unoffical English version website: https://birth.dataset.pub.
- A unified analysis of label inference attacks, 2023. Presentd at NeurIPS 2023 Workshop on Regulatable ML.
- A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In Proceedings of the 2008 IEEE Symposium on Security and Privacy, S&P ‘08, pages 111–125, 2008.
- Adversary instantiation: Lower bounds for differentially private machine learning. In Proceedings of the 42nd IEEE Symposium on Security and Privacy, S&P ‘21, pages 866–882, 2021.
- M. Rigaki and S. García. A survey of privacy attacks in machine learning. ACM Computing Surveys, 56(4):101:1–101:34, 2024.
- Sok: Let the privacy games begin! A unified treatment of data inference privacy in machine learning. In Proceedings of the 44th IEEE Symposium on Security and Privacy, S&P ‘23, pages 327–345, 2023.
- Privacy auditing with one (1) training run. In Advances in Neural Information Processing Systems, NeurIPS ‘23, pages 49268–49280, 2023.
- L. Sweeney. Weaving technology and policy together to maintain confidentiality. The Journal of Law, Medicine & Ethics, 25(2-3):98–110, 1997.
- L. Willenborg and T. Waal. Statistical Disclosure Control in Practice. Lecture Notes in Statistics. Springer New York, NY, 1996.
- Differentially private SQL with bounded user contribution. Proceedings of Privacy Enhancing Technologies, 2020(2):230–250, 2020.
- Privbayes: private data release via bayesian networks. In Proceedings of the International Conference on Management of Data, SIGMOD ‘14, pages 1423–1434, 2014.
- Rachel Cummings (41 papers)
- Shlomi Hod (7 papers)
- Jayshree Sarathy (8 papers)
- Marika Swanberg (9 papers)