Papers
Topics
Authors
Recent
2000 character limit reached

Private Regression via Data-Dependent Sufficient Statistic Perturbation (2405.15002v1)

Published 23 May 2024 in cs.LG and stat.ML

Abstract: Sufficient statistic perturbation (SSP) is a widely used method for differentially private linear regression. SSP adopts a data-independent approach where privacy noise from a simple distribution is added to sufficient statistics. However, sufficient statistics can often be expressed as linear queries and better approximated by data-dependent mechanisms. In this paper we introduce data-dependent SSP for linear regression based on post-processing privately released marginals, and find that it outperforms state-of-the-art data-independent SSP. We extend this result to logistic regression by developing an approximate objective that can be expressed in terms of sufficient statistics, resulting in a novel and highly competitive SSP approach for logistic regression. We also make a connection to synthetic data for machine learning: for models with sufficient statistics, training on synthetic data corresponds to data-dependent SSP, with the overall utility determined by how well the mechanism answers these linear queries.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308–318, 2016.
  2. Differentially private query release through adaptive projection. In International Conference on Machine Learning, pages 457–467. PMLR, 2021.
  3. Private empirical risk minimization: Efficient algorithms and tight error bounds. In 2014 IEEE 55th annual symposium on foundations of computer science, pages 464–473. IEEE, 2014.
  4. B. Becker and R. Kohavi. Adult. UCI Machine Learning Repository, 1996. DOI: https://doi.org/10.24432/C5XW20.
  5. G. Bernstein and D. R. Sheldon. Differentially private bayesian linear regression. Advances in Neural Information Processing Systems, 32, 2019.
  6. M. Bun and T. Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pages 635–658. Springer, 2016.
  7. K. Chaudhuri and C. Monteleoni. Privacy-preserving logistic regression. Advances in neural information processing systems, 21, 2008.
  8. Differentially private empirical risk minimization. Journal of Machine Learning Research, 12(3), 2011.
  9. Differential privacy for bayesian inference through posterior sampling. Journal of machine learning research, 18(11):1–39, 2017.
  10. Differentially private data cubes: optimizing noise sources and consistency. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 217–228, 2011.
  11. Retiring adult: New datasets for fair machine learning. Advances in neural information processing systems, 34:6478–6490, 2021.
  12. C. Dwork and A. Smith. Differential privacy for statistics: What we know and what we want to learn. Journal of Privacy and Confidentiality, 1(2), 2010.
  13. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3, pages 265–284. Springer, 2006.
  14. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
  15. Parametric bootstrap for differentially private confidence intervals. In International Conference on Artificial Intelligence and Statistics, pages 1598–1618. PMLR, 2022.
  16. R. A. Fisher. On the mathematical foundations of theoretical statistics. Philosophical transactions of the Royal Society of London. Series A, containing papers of a mathematical or physical character, 222(594-604):309–368, 1922.
  17. On the theory and practice of privacy-preserving bayesian data analysis. arXiv preprint arXiv:1603.07294, 2016.
  18. Dual query: Practical private query release for high dimensional data. In International Conference on Machine Learning, pages 1170–1178. PMLR, 2014.
  19. R. M. Gower and F. Bach. Properties and examples of convexity and smoothness, February 2019. URL https://gowerrobert.github.io/pdf/M2_statistique_optimisation/exe_convexity_smoothness_solution.pdf.
  20. Sdnist: Benchmark data and evaluation tools for data sythesizers, Dec. 2021.
  21. A simple and practical algorithm for differentially private data release. Advances in neural information processing systems, 25, 2012.
  22. Boosting the accuracy of differentially-private histograms through consistency. arXiv preprint arXiv:0904.0942, 2009.
  23. Pass-glm: polynomial approximate sufficient statistics for scalable bayesian glm inference. Advances in Neural Information Processing Systems, 30, 2017.
  24. P. Jain and A. Thakurta. Differentially private learning with kernels. In International conference on machine learning, pages 118–126. PMLR, 2013.
  25. Pate-gan: Generating synthetic data with differential privacy guarantees. In International conference on learning representations, 2019.
  26. Private convex empirical risk minimization and high-dimensional regression. In Conference on Learning Theory, pages 25–1. JMLR Workshop and Conference Proceedings, 2012.
  27. Differentially private bayesian inference for generalized linear models. In International Conference on Machine Learning, pages 5838–5849. PMLR, 2021.
  28. C. Li and G. Miklau. An adaptive mechanism for accurate query answering under differential privacy. arXiv preprint arXiv:1202.3807, 2012.
  29. Optimizing linear counting queries under differential privacy. In Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 123–134, 2010.
  30. A data-and workload-aware algorithm for range queries under differential privacy. arXiv preprint arXiv:1410.0265, 2014.
  31. The matrix mechanism: optimizing linear counting queries under differential privacy. The VLDB journal, 24:757–781, 2015.
  32. Iterative methods for private synthetic data: Unifying framework and new methods. Advances in Neural Information Processing Systems, 34:690–702, 2021.
  33. Graphical-model based estimation and inference for differential privacy. In International Conference on Machine Learning, pages 4435–4444. PMLR, 2019.
  34. Hdmm: Optimizing error of high-dimensional statistical queries under differential privacy. arXiv preprint arXiv:2106.12118, 2021a.
  35. Winning the nist contest: A scalable and general approach to differentially private synthetic data. arXiv preprint arXiv:2108.04978, 2021b.
  36. Aim: An adaptive and iterative mechanism for differentially private synthetic data. arXiv preprint arXiv:2201.12677, 2022.
  37. F. McSherry and I. Mironov. Differentially private recommender systems: Building privacy into the netflix prize contenders. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 627–636, 2009.
  38. NIST 2018 Differential Privacy Synthetic Data Challenge. https://www.nist.gov/ctl/pscr/open-innovation-prize-challenges/past-prize-challenges/2018-differential-privacy-synthetic, 2018.
  39. NIST 2020 Differential Privacy Temporal Map Challenge. https://www.nist.gov/ctl/pscr/open-innovation-prize-challenges/past-prize-challenges/2020-differential-privacy-temporal, 2020.
  40. Priview: practical differentially private release of marginal contingency tables. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 1435–1446, 2014.
  41. Challenge design and lessons learned from the 2018 differential privacy challenges. NIST technical note, 2151, 2021.
  42. Differentially private synthetic data: Applied evaluations and enhancements. arXiv preprint arXiv:2011.05537, 2020.
  43. O. Sheffet. Differentially private ordinary least squares. In International Conference on Machine Learning, pages 3105–3114. PMLR, 2017.
  44. Benchmarking differentially private synthetic data generation algorithms. arXiv preprint arXiv:2112.09238, 2021.
  45. New oracle-efficient algorithms for private synthetic data release. In International Conference on Machine Learning, pages 9765–9774. PMLR, 2020.
  46. Private synthetic data for multitask learning and marginal queries. arXiv preprint arXiv:2209.07400, 2022.
  47. D. Vu and A. Slavkovic. Differential privacy for clinical trial data: Preliminary evaluations. In 2009 IEEE International Conference on Data Mining Workshops, pages 138–143. IEEE, 2009.
  48. Y.-X. Wang. Revisiting differentially private linear regression: optimal and adaptive prediction & estimation in unbounded domain. arXiv preprint arXiv:1803.02596, 2018.
  49. Privacy for free: Posterior sampling and stochastic gradient monte carlo. In International Conference on Machine Learning, pages 2493–2502. PMLR, 2015.
  50. Dpcube: Differentially private histogram release through multidimensional partitioning. arXiv preprint arXiv:1202.5358, 2012.
  51. Differentially private generative adversarial network. arXiv preprint arXiv:1802.06739, 2018.
  52. Differentially private histogram publication. The VLDB journal, 22:797–822, 2013.
  53. Accurate and efficient private release of datacubes and contingency tables. In 2013 IEEE 29th International Conference on Data Engineering (ICDE), pages 745–756. IEEE, 2013.
  54. Privbayes: Private data release via bayesian networks. ACM Transactions on Database Systems (TODS), 42(4):1–41, 2017.
  55. Towards accurate histogram publication under differential privacy. In Proceedings of the 2014 SIAM international conference on data mining, pages 587–595. SIAM, 2014.
  56. On the differential privacy of bayesian inference. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30,1, 2016.
  57. Bounding the excess risk for linear models trained on marginal-preserving, differentially-private, synthetic data. arXiv preprint arXiv:2402.04375, 2024.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 4 tweets with 21 likes about this paper.