Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

"What do you want from theory alone?" Experimenting with Tight Auditing of Differentially Private Synthetic Data Generation (2405.10994v1)

Published 16 May 2024 in cs.CR

Abstract: Differentially private synthetic data generation (DP-SDG) algorithms are used to release datasets that are structurally and statistically similar to sensitive data while providing formal bounds on the information they leak. However, bugs in algorithms and implementations may cause the actual information leakage to be higher. This prompts the need to verify whether the theoretical guarantees of state-of-the-art DP-SDG implementations also hold in practice. We do so via a rigorous auditing process: we compute the information leakage via an adversary playing a distinguishing game and running membership inference attacks (MIAs). If the leakage observed empirically is higher than the theoretical bounds, we identify a DP violation; if it is non-negligibly lower, the audit is loose. We audit six DP-SDG implementations using different datasets and threat models and find that black-box MIAs commonly used against DP-SDGs are severely limited in power, yielding remarkably loose empirical privacy estimates. We then consider MIAs in stronger threat models, i.e., passive and active white-box, using both existing and newly proposed attacks. Overall, we find that, currently, we do not only need white-box MIAs but also worst-case datasets to tightly estimate the privacy leakage from DP-SDGs. Finally, we show that our automated auditing procedure finds both known DP violations (in 4 out of the 6 implementations) as well as a new one in the DPWGAN implementation that was successfully submitted to the NIST DP Synthetic Data Challenge. The source code needed to reproduce our experiments is available from https://github.com/spalabucr/synth-audit.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. Deep Learning with Differential Privacy. In CCS, 2016.
  2. The 2020 census disclosure avoidance system topdown algorithm. Harvard Data Science Review, 2022.
  3. Mostly AI. Read the QA Report. https://mostly.ai/docs/guides/qa-report#nearest-neighbor-distance-ratio, 2023.
  4. Differentially Private Dataset Release using Wasserstein GANs. https://github.com/nesl/nist_differential_privacy_synthetic_data_challenge/, 2019.
  5. One-shot Empirical Privacy Estimation for Federated Learning. In ICLR, 2024.
  6. A Linear Reconstruction Approach for Attribute Inference Attacks against Synthetic Data. arXiv:2301.10053, 2023.
  7. Apple. Learning with Privacy at Scale. https://docs-assets.developer.apple.com/ml-research/papers/learning-with-privacy-at-scale.pdf, 2017.
  8. Differentially Private Query Release Through Adaptive Projection. In ICML, 2021.
  9. DP-Sniper: Black-Box Discovery of Differential Privacy Violations using Classifiers. In IEEE S&P, 2021.
  10. US Census Bureau. The Census Bureau’s Simulated Reconstruction-Abetted Re-identification Attack on the 2010 Census. https://www.census.gov/data/academy/webinars/2021/disclosure-avoidance-series/simulated-reconstruction-abetted-re-identification-attack-on-the-2010-census.html, 2021.
  11. Data Synthesis via Differentially Private Markov Random Fields. VLDB Endowment, 2021.
  12. Label-Only Membership Inference Attacks. In ICML, 2021.
  13. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 1934.
  14. European Commission. European data strategy. https://ec.europa.eu/info/strategy/priorities-2019-2024/europe-fit-digital-age/european-data-strategy, 2022.
  15. DataSF. Fire Department Calls for Service. https://data.sfgov.org/Public-Safety/Fire-Department-Calls-for-Service/nuek-vuh3, 2016.
  16. Privacy Side Channels in Machine Learning Systems. arXiv:2309.05610, 2023.
  17. Damien Desfontaines. A list of real-world uses of differential privacy. https://desfontain.es/privacy/real-world-differential-privacy.html, 2023.
  18. Gaussian Differential Privacy. arXiv:1905.02383, 2019.
  19. Calibrating Noise to Sensitivity in Private Data Analysis. In Theory of Cryptography, 2006.
  20. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. In CCS, 2014.
  21. Office for National Statistics. Synthesising the linked 2011 Census and deaths dataset while preserving its confidentiality. https://datasciencecampus.ons.gov.uk/synthesising-the-linked-2011-census-and-deaths-dataset-while-preserving-its-confidentiality/, 2023.
  22. Pool Inference Attacks on Local Differential Privacy: Quantifying the Privacy Guarantees of Apple’s Count Mean Sketch in Practice. In USENIX Security, 2022.
  23. Understanding how Differentially Private Generative Models Spend their Privacy Budget. arXiv:2305.10994, 2023.
  24. Google. TensorFlow Privacy. https://github.com/tensorflow/privacy, 2019.
  25. Synthetic is all you need: removing the auxiliary data assumption for membership inference attacks against synthetic data. arXiv:2307.01701, 2023.
  26. LOGAN: Membership Inference Attacks Against Generative Models. In PETS, 2019.
  27. Monte Carlo and Reconstruction Membership Inference Attacks against Generative Models. In PETS, 2019.
  28. TAPAS: a Toolbox for Adversarial Privacy Auditing of Synthetic Data. In SyntheticData4ML Workshoph NeurIPS, 2022.
  29. Auditing Differentially Private Machine Learning: How Private is Private SGD? NeurIPS, 2020.
  30. Evaluating differentially private machine learning in practice. In USENIX Security, 2019.
  31. Matthew Johnson. GitHub Pull Request: fix prng key reuse in differential privacy example. https://github.com/google/jax/pull/3646, 2020.
  32. PATE-GAN: Generating synthetic data with differential privacy guarantees. In ICLR, 2018.
  33. The Composition Theorem for Differential Privacy. In ICML, 2015.
  34. UCI ADULT Data Set. https://archive.ics.uci.edu/ml/datasets/adult, 1996.
  35. Disparate vulnerability to membership inference attacks. In PETS, 2022.
  36. DPSyn: Experiences in the NIST Differential Privacy Data Synthesis Challenges. arXiv:2106.12949, 2021.
  37. Iterative Methods for Private Synthetic Data: Unifying Framework and New Methods. In NeurIPS, 2021.
  38. Group and Attack: Auditing Differential Privacy. In CCS, 2023.
  39. Empirical Evaluation on Synthetic Data Generation with Generative Adversarial Network. In WIMS, 2019.
  40. l-diversity: Privacy beyond k-anonymity. ACM TKDD, 2007.
  41. CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated Learning. In ICLR, 2023.
  42. dpart: Differentially Private Autoregressive Tabular, a General Framework for Synthetic Data Generation. TPDP, 2022.
  43. Winning the NIST Contest: A scalable and general approach to differentially private synthetic data. JPC, 2021.
  44. AIM: An Adaptive and Iterative Mechanism for Differentially Private Synthetic Data. VLDB Endowment, 2022.
  45. Federated learning with formal differential privacy guarantees. Google AI Blog, 2022.
  46. Achilles’ Heels: Vulnerable Record Identification in Synthetic Data Publishing. arXiv:2306.10308, 2023.
  47. Microsoft. SmartNoise SDK: Tools for Differential Privacy on Tabular Data. https://github.com/opendp/smartnoise-sdk, 2021.
  48. Microsoft. IOM and Microsoft release first-ever differentially private synthetic dataset to counter human trafficking. https://www.microsoft.com/en-us/research/blog/iom-and-microsoft-release-first-ever-differentially-private-synthetic-dataset-to-counter-human-trafficking/, 2022.
  49. Tight Auditing of Differentially Private Machine Learning. In USENIX Security, 2023.
  50. Comprehensive privacy analysis of deep learning. In IEEE S&P, 2018.
  51. Adversary Instantiation: Lower Bounds for Differentially Private Machine Learning. In IEEE S&P, 2021.
  52. National Institute of Standards and Technology. Differential Privacy Synthetic Data Challenge Algorithms. https://github.com/usnistgov/PrivacyEngCollabSpace/tree/master/tools/de-identification/Differential-Privacy-Synthetic-Data-Challenge-Algorithms, 2024.
  53. DP-Opt: Identify High Differential Privacy Violation by Optimization. In WASA, 2022.
  54. National Institute of Standards and Technology. 2018 Differential Privacy Synthetic Data Challenge. https://www.nist.gov/ctl/pscr/open-innovation-prize-challenges/past-prize-challenges/2018-differential-privacy-synthetic, 2018.
  55. The Synthetic data vault. In IEEE DSAA, 2016.
  56. DataSynthesizer: Privacy-Preserving Synthetic Datasets. In SSDBM, 2017.
  57. Knock Knock, Who’s There? Membership Inference on Aggregate Location Data. In NDSS, 2018.
  58. Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv:2301.07573, 2023.
  59. Membership Inference Attacks against Machine Learning Models. In IEEE S&P, 2017.
  60. Synthetic Data – Anonymisation Groundhog Day. In USENIX Security, 2022.
  61. Statice. Statice by Anonos. https://www.statice.ai/, 2023.
  62. Privacy Auditing with One (1) Training Run. In NeurIPS, 2023.
  63. Latanya Sweeney. k-anonymity: A model for protecting privacy. IJUFKS, 2002.
  64. Syntegra. Syntegra. https://www.syntegra.io/, 2023.
  65. Privacy Loss in Apple’s Implementation of Differential Privacy on MacOS 10.12. arXiv:1709.02753, 2017.
  66. Tonic.ai. Tonic. https://www.tonic.ai/, 2023.
  67. UC Davis. $1.2 million to study synthetic data use. https://health.ucdavis.edu/health-magazine/issues/fall2022/noteworthy/study-synthetic-data-use.html, 2022.
  68. DPMLBench: Holistic Evaluation of Differentially Private Machine Learning. In ACM CCS, 2023.
  69. Differentially private generative adversarial network. arXiv:1802.06739, 2018.
  70. Assessing privacy and quality of synthetic health data. In ACM AIDR, 2019.
  71. YData. YData. https://www.ydata.ai/, 2023.
  72. Enhanced membership inference attacks against machine learning models. In CCS, 2022.
  73. Opacus: User-friendly differential privacy library in PyTorch. arXiv:2109.12298, 2021.
  74. Bayesian estimation of differential privacy. In ICML, 2023.
  75. PrivBayes: Private Data Release via Bayesian Networks. ACM TODS, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com