Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

High Epsilon Synthetic Data Vulnerabilities in MST and PrivBayes (2402.06699v1)

Published 9 Feb 2024 in cs.CR

Abstract: Synthetic data generation (SDG) has become increasingly popular as a privacy-enhancing technology. It aims to maintain important statistical properties of its underlying training data, while excluding any personally identifiable information. There have been a whole host of SDG algorithms developed in recent years to improve and balance both of these aims. Many of these algorithms provide robust differential privacy guarantees. However, we show here that if the differential privacy parameter $\varepsilon$ is set too high, then unambiguous privacy leakage can result. We show this by conducting a novel membership inference attack (MIA) on two state-of-the-art differentially private SDG algorithms: MST and PrivBayes. Our work suggests that there are vulnerabilities in these generators not previously seen, and that future work to strengthen their privacy is advisable. We present the heuristic for our MIA here. It assumes knowledge of auxiliary "population" data, and also assumes knowledge of which SDG algorithm was used. We use this information to adapt the recent DOMIAS MIA uniquely to MST and PrivBayes. Our approach went on to win the SNAKE challenge in November 2023.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (13)
  1. SNAKE Challenge: Sanitization Algorithms under Attack. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 5010–5014.
  2. Block neural autoregressive flow. In Uncertainty in Artificial Intelligence, 1263–1273. PMLR.
  3. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4): 211–407.
  4. Logan: Membership inference attacks against generative models. Proceedings on Privacy Enhancing Technologies.
  5. Monte Carlo and Reconstruction Membership Inference Attacks against Generative Models. Proceedings on Privacy Enhancing Technologies, 2019(4): 232–249.
  6. TAPAS: A toolbox for adversarial privacy auditing of synthetic data. arXiv preprint arXiv:2211.06550.
  7. Synthetic Data – what, why and how? arXiv:2205.03257.
  8. Winning the NIST Contest: A scalable and general approach to differentially private synthetic data. arXiv preprint arXiv:2108.04978.
  9. Scott, D. W. 2015. Multivariate density estimation: theory, practice, and visualization. John Wiley & Sons.
  10. Membership Inference Attacks Against Machine Learning Models. In 2017 IEEE Symposium on Security and Privacy (SP), 3–18.
  11. Membership inference attacks against synthetic data through overfitting detection. AISTATS.
  12. Privacy risk in machine learning: Analyzing the connection to overfitting. In 2018 IEEE 31st computer security foundations symposium (CSF), 268–282. IEEE.
  13. Privbayes: Private data release via bayesian networks. ACM Transactions on Database Systems (TODS), 42(4): 1–41.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Steven Golob (2 papers)
  2. Sikha Pentyala (11 papers)
  3. Anuar Maratkhan (2 papers)
  4. Martine De Cock (30 papers)