Papers
Topics
Authors
Recent
2000 character limit reached

Differentially Private Synthetic Data with Private Density Estimation (2405.04554v1)

Published 6 May 2024 in cs.CR, cs.IT, cs.LG, math.IT, math.ST, stat.ML, and stat.TH

Abstract: The need to analyze sensitive data, such as medical records or financial data, has created a critical research challenge in recent years. In this paper, we adopt the framework of differential privacy, and explore mechanisms for generating an entire dataset which accurately captures characteristics of the original data. We build upon the work of Boedihardjo et al, which laid the foundations for a new optimization-based algorithm for generating private synthetic data. Importantly, we adapt their algorithm by replacing a uniform sampling step with a private distribution estimator; this allows us to obtain better computational guarantees for discrete distributions, and develop a novel algorithm suitable for continuous distributions. We also explore applications of our work to several statistical tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. M. Boedihardjo, T. Strohmer, and R. Vershynin, “Privacy of synthetic data: A statistical framework,” IEEE Transactions on Information Theory, vol. 69, no. 1, pp. 520–527, 2022.
  2. C. Dwork, “Differential privacy,” in International Colloquium on Automata, Languages, and Programming.   Springer, 2006, pp. 1–12.
  3. C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in Theory of Cryptography: Third Theory of Cryptography Conference.   Springer, 2006, pp. 265–284.
  4. A. Nikolov, K. Talwar, and L. Zhang, “The geometry of differential privacy: the sparse and approximate cases,” in Proceedings of the 45th Annual ACM Symposium on Theory of Computing, 2013, pp. 351–360.
  5. K. Nissim, S. Raskhodnikova, and A. Smith, “Smooth sensitivity and sampling in private data analysis,” in Proceedings of the 39th Annual ACM Symposium on Theory of Computing, 2007, pp. 75–84.
  6. C. Li, M. Hay, V. Rastogi, G. Miklau, and A. McGregor, “Optimizing linear counting queries under differential privacy,” in ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2010, pp. 123–134.
  7. N. Mohammed, R. Chen, B. C. Fung, and P. S. Yu, “Differentially private data release for data mining,” in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011, pp. 493–501.
  8. C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211–407, 2014.
  9. A. Blum, K. Ligett, and A. Roth, “A learning theory approach to noninteractive database privacy,” Journal of the ACM, vol. 60, no. 2, pp. 1–25, 2013.
  10. B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar, “Privacy, accuracy, and consistency too: A holistic solution to contingency table release,” in ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2007, pp. 273–282.
  11. M. Hardt and G. N. Rothblum, “A multiplicative weights mechanism for privacy-preserving data analysis,” in 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.   IEEE, 2010, pp. 61–70.
  12. J. Jordon, L. Szpruch, F. Houssiau, M. Bottarelli, G. Cherubin, C. Maple, S. N. Cohen, and A. Weller, “Synthetic data–what, why and how?” arXiv preprint arXiv:2205.03257, 2022.
  13. L. Wasserman and S. Zhou, “A statistical framework for differential privacy,” Journal of the American Statistical Association, vol. 105, no. 489, pp. 375–389, 2010.
  14. F. Alda and B. Rubinstein, “The Bernstein mechanism: Function release under differential privacy,” in AAAI Conference on Artificial Intelligence, vol. 31, no. 1, 2017.
  15. M. Boedihardjo, T. Strohmer, and R. Vershynin, “Covariance’s loss is privacy’s gain: Computationally efficient, private and accurate synthetic data,” Foundations of Computational Mathematics, pp. 1–48, 2022.
  16. ——, “Private sampling: A noiseless approach for generating differentially private synthetic data,” SIAM Journal on Mathematics of Data Science, vol. 4, no. 3, pp. 1082–1115, 2022.
  17. M. Balog, I. Tolstikhin, and B. Schölkopf, “Differentially private database release via kernel mean embeddings,” in International Conference on Machine Learning.   PMLR, 2018, pp. 414–422.
  18. M. Hardt, K. Ligett, and F. McSherry, “A simple and practical algorithm for differentially private data release,” Advances in Neural Information Processing Systems, vol. 25, 2012.
  19. J. Ullman and S. Vadhan, “PCPs and the hardness of generating private synthetic data,” in Theory of Cryptography Conference.   Springer, 2011, pp. 400–416.
  20. P.-H. Lu and C.-M. Yu, “Poster: A unified framework of differentially private synthetic data release with generative adversarial network,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017, pp. 2547–2549.
  21. J. Jordon, J. Yoon, and M. Van Der Schaar, “PATE-GAN: Generating synthetic data with differential privacy guarantees,” in International Conference on Learning Representations, 2018.
  22. C. Xu, J. Ren, D. Zhang, Y. Zhang, Z. Qin, and K. Ren, “GANobfuscator: Mitigating information leakage under GAN via differential privacy,” IEEE Transactions on Information Forensics and Security, vol. 14, no. 9, pp. 2358–2371, 2019.
  23. B. K. Beaulieu-Jones, Z. S. Wu, C. Williams, R. Lee, S. P. Bhavnani, J. B. Byrd, and C. S. Greene, “Privacy-preserving generative deep neural networks support clinical data sharing,” Circulation: Cardiovascular Quality and Outcomes, vol. 12, no. 7, p. e005122, 2019.
  24. T. Dockhorn, T. Cao, A. Vahdat, and K. Kreis, “Differentially private diffusion models,” Transactions on Machine Learning Research, 2023.
  25. R. McKenna, G. Miklau, and D. Sheldon, “Winning the NIST contest: A scalable and general approach to differentially private synthetic data,” Journal of Privacy and Confidentiality, vol. 11, no. 3, 2021.
  26. Z. Lin, S. Gopi, J. Kulkarni, H. Nori, and S. Yekhanin, “Differentially private synthetic data via foundation model APIs 1: Images,” in The Twelfth International Conference on Learning Representations, 2023.
  27. J. Thaler, J. Ullman, and S. Vadhan, “Faster algorithms for privately releasing marginals,” in International Colloquium on Automata, Languages, and Programming.   Springer, 2012, pp. 810–821.
  28. G. Barthe, R. Chadha, P. Krogmeier, A. P. Sistla, and M. Viswanathan, “Deciding accuracy of differential privacy schemes,” Proceedings of the ACM on Programming Languages, vol. 5, no. POPL, pp. 1–30, 2021.
  29. C. Dwork, K. Talwar, A. Thakurta, and L. Zhang, “Analyze Gauss: Optimal bounds for privacy-preserving principal component analysis,” in Proceedings of the 46th Annual ACM Symposium on Theory of Computing, 2014, pp. 11–20.
  30. J. Blocki, A. Blum, A. Datta, and O. Sheffet, “The Johnson-Lindenstrauss transform itself preserves differential privacy,” in 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.   IEEE, 2012, pp. 410–419.
  31. V. Karwa, S. Raskhodnikova, A. Smith, and G. Yaroslavtsev, “Private analysis of graph structure,” Proceedings of the VLDB Endowment, vol. 4, no. 11, pp. 1146–1157, 2011.
  32. P. Loh and M. J. Wainwright, “Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses,” The Annals of Statistics, vol. 41, no. 6, p. 3022, 2013.
  33. U. von Luxburg and O. Bousquet, “Distance-based classification with Lipschitz functions.” Journal of Machine Learning Research, vol. 5, no. Jun, pp. 669–695, 2004.
  34. S. Bubeck and M. Sellke, “A universal law of robustness via isoperimetry,” Advances in Neural Information Processing Systems, vol. 34, pp. 28 811–28 822, 2021.
  35. L. V. Kovalev, “Lipschitz clustering in metric spaces,” The Journal of Geometric Analysis, vol. 32, no. 7, p. 188, 2022.
  36. M. Hein and M. Andriushchenko, “Formal guarantees on the robustness of a classifier against adversarial manipulation,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  37. J. Cohen, E. Rosenfeld, and Z. Kolter, “Certified adversarial robustness via randomized smoothing,” in International Conference on Machine Learning.   PMLR, 2019, pp. 1310–1320.
  38. J. Ullman and S. Vadhan, “Pcps and the hardness of generating private synthetic data,” in Theory of Cryptography Conference.   Springer, 2011, pp. 400–416.
  39. A. R. Asadi and P. Loh, “On the Gibbs exponential mechanism and private synthetic data generation,” in 2023 IEEE International Symposium on Information Theory (ISIT).   IEEE, 2023, pp. 2213–2218.
  40. N. C. Abay, Y. Zhou, M. Kantarcioglu, B. Thuraisingham, and L. Sweeney, “Privacy preserving synthetic data release using deep learning,” in Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018, Dublin, Ireland, September 10–14, 2018, Proceedings, Part I 18.   Springer, 2019, pp. 510–526.
  41. R. Dwivedi and L. Mackey, “Kernel thinning,” in Conference on Learning Theory.   PMLR, 2021, pp. 1753–1753.
  42. I. Mironov, “Rényi differential privacy,” in 2017 IEEE 30th Computer Security Foundations Symposium (CSF).   IEEE, 2017, pp. 263–275.
  43. I. Sason and S. Verdú, “Upper bounds on the relative entropy and Rényi divergence as a function of total variation distance for finite alphabets,” in 2015 IEEE Information Theory Workshop.   IEEE, 2015, pp. 214–218.
  44. D. Berend and A. Kontorovich, “On the convergence of the empirical distribution,” arXiv preprint arXiv:1205.6711, 2012.
  45. C. Micchelli, “The saturation class and iterates of the Bernstein polynomials,” Journal of Approximation Theory, vol. 8, no. 1, pp. 1–18, 1973.
  46. H. Jiang, “Uniform convergence rates for kernel density estimation,” in ICML.   PMLR, 2017, pp. 1694–1703.

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Sign up for free to view the 3 tweets with 7 likes about this paper.