Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Joint Selection: Adaptively Incorporating Public Information for Private Synthetic Data (2403.07797v1)

Published 12 Mar 2024 in cs.LG and cs.AI

Abstract: Mechanisms for generating differentially private synthetic data based on marginals and graphical models have been successful in a wide range of settings. However, one limitation of these methods is their inability to incorporate public data. Initializing a data generating model by pre-training on public data has shown to improve the quality of synthetic data, but this technique is not applicable when model structure is not determined a priori. We develop the mechanism jam-pgm, which expands the adaptive measurements framework to jointly select between measuring public data and private data. This technique allows for public data to be included in a graphical-model-based mechanism. We show that jam-pgm is able to outperform both publicly assisted and non publicly assisted synthetic data generation mechanisms even when the public data distribution is biased.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Limits of private learning with access to public data. Advances in neural information processing systems, 32.
  2. Public data-assisted mirror descent for private model training. In International Conference on Machine Learning, pages 517–535. PMLR.
  3. Differentially private release of datasets using gaussian copula. Journal of Privacy and Confidentiality, 10(2).
  4. Differentially private query release through adaptive projection. In Meila, M. and Zhang, T., editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 457–467. PMLR.
  5. Private query release assisted by public data. In International Conference on Machine Learning, pages 695–703. PMLR.
  6. Model-agnostic private learning. Advances in Neural Information Processing Systems, 31.
  7. Comparative study of differentially private data synthesis methods. Statistical Science, 35(2):280–307.
  8. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pages 635–658. Springer.
  9. Data synthesis via differentially private markov random fields. Proceedings of the VLDB Endowment, 14(11):2190–2202.
  10. The discrete gaussian for differential privacy. In NeurIPS.
  11. Bounding, concentrating, and truncating: Unifying privacy loss composition for data analytics. In Proceedings of the 32nd International Conference on Algorithmic Learning Theory, volume 132 of Proceedings of Machine Learning Research, pages 421–457.
  12. Charest, A. (2011). How can we analyze differentially-private synthetic datasets? Journal of Privacy and Confidentiality, 2(2).
  13. Differentially private high-dimensional data publication via sampling-based inference. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 129–138. ACM.
  14. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265–284.
  15. Dual query: Practical private query release for high dimensional data. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, volume 32 of JMLR Workshop and Conference Proceedings, pages 1170–1178. JMLR.org.
  16. Kamino: Constraint-aware differentially private data synthesis. Proceedings of the VLDB Endowment, 14(10):1886–1899.
  17. SDNist: Benchmark Data and Evaluation Tools for Data Sythesizers.
  18. A simple and practical algorithm for differentially private data release. In Bartlett, P. L., Pereira, F. C. N., Burges, C. J. C., Bottou, L., and Weinberger, K. Q., editors, Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States, pages 2348–2356.
  19. Encyclopedia titanica.
  20. Principled evaluation of differentially private algorithms using dpbench. In Proceedings of the 2016 International Conference on Management of Data, pages 139–154.
  21. Differential privacy based on importance weighting. Machine learning, 93(1):163–183.
  22. PATE-GAN: generating synthetic data with differential privacy guarantees. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
  23. (nearly) dimension independent private erm with adagrad rates via publicly estimated subspaces. In Conference on Learning Theory, pages 2717–2746. PMLR.
  24. Kohavi, R. et al. (1996). Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid. In Kdd, volume 96, pages 202–207.
  25. Leveraging public data for practical private query release. In ICML, pages 6968–6977.
  26. Iterative methods for private synthetic data: Unifying framework and new methods. In Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W., editors, Advances in Neural Information Processing Systems.
  27. Winning the nist contest: A scalable and general approach to differentially private synthetic data. Journal of Privacy and Confidentiality, 11(3).
  28. Aim: An adaptive and iterative mechanism for differentially private synthetic data. Proc. VLDB Endow., 15(11):2599–2612.
  29. Relaxed marginal consistency for differentially private query answering. Advances in Neural Information Processing Systems, 34.
  30. Graphical-model based estimation and inference for differential privacy. In International Conference on Machine Learning, pages 4435–4444.
  31. Mechanism design via differential privacy. In FOCS.
  32. Semi-supervised knowledge transfer for deep learning from private training data. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net.
  33. Challenge design and lessons learned from the 2018 differential privacy challenges. Technical report, NIST.
  34. New oracle-efficient algorithms for private synthetic data release. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 9765–9774. PMLR.
  35. Differentially private learning with small public data. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34(04), pages 6219–6226.
  36. Fully adaptive composition in differential privacy. arXiv preprint arXiv:2203.05481.
  37. Differentially private generative adversarial network. CoRR, abs/1802.06739.
  38. Differentially private fine-tuning of language models. In International Conference on Learning Representations.
  39. Privbayes: Private data release via bayesian networks. ACM Transactions on Database Systems (TODS), 42(4):25:1–25:41.
  40. Differentially private releasing via deep generative model (technical report). arXiv preprint arXiv:1801.01594.
  41. Privsyn: Differentially private data synthesis. In 30th USENIX Security Symposium (USENIX Security 21), pages 929–946. USENIX Association.
  42. Bypassing the ambient dimension: Private sgd with gradient subspace identification. In International Conference on Learning Representations.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Miguel Fuentes (5 papers)
  2. Brett Mullins (5 papers)
  3. Ryan McKenna (26 papers)
  4. Gerome Miklau (33 papers)
  5. Daniel Sheldon (39 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com