Private Regression via Data-Dependent Sufficient Statistic Perturbation (2405.15002v1)
Abstract: Sufficient statistic perturbation (SSP) is a widely used method for differentially private linear regression. SSP adopts a data-independent approach where privacy noise from a simple distribution is added to sufficient statistics. However, sufficient statistics can often be expressed as linear queries and better approximated by data-dependent mechanisms. In this paper we introduce data-dependent SSP for linear regression based on post-processing privately released marginals, and find that it outperforms state-of-the-art data-independent SSP. We extend this result to logistic regression by developing an approximate objective that can be expressed in terms of sufficient statistics, resulting in a novel and highly competitive SSP approach for logistic regression. We also make a connection to synthetic data for machine learning: for models with sufficient statistics, training on synthetic data corresponds to data-dependent SSP, with the overall utility determined by how well the mechanism answers these linear queries.
- Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308–318, 2016.
- Differentially private query release through adaptive projection. In International Conference on Machine Learning, pages 457–467. PMLR, 2021.
- Private empirical risk minimization: Efficient algorithms and tight error bounds. In 2014 IEEE 55th annual symposium on foundations of computer science, pages 464–473. IEEE, 2014.
- B. Becker and R. Kohavi. Adult. UCI Machine Learning Repository, 1996. DOI: https://doi.org/10.24432/C5XW20.
- G. Bernstein and D. R. Sheldon. Differentially private bayesian linear regression. Advances in Neural Information Processing Systems, 32, 2019.
- M. Bun and T. Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pages 635–658. Springer, 2016.
- K. Chaudhuri and C. Monteleoni. Privacy-preserving logistic regression. Advances in neural information processing systems, 21, 2008.
- Differentially private empirical risk minimization. Journal of Machine Learning Research, 12(3), 2011.
- Differential privacy for bayesian inference through posterior sampling. Journal of machine learning research, 18(11):1–39, 2017.
- Differentially private data cubes: optimizing noise sources and consistency. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 217–228, 2011.
- Retiring adult: New datasets for fair machine learning. Advances in neural information processing systems, 34:6478–6490, 2021.
- C. Dwork and A. Smith. Differential privacy for statistics: What we know and what we want to learn. Journal of Privacy and Confidentiality, 1(2), 2010.
- Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3, pages 265–284. Springer, 2006.
- The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
- Parametric bootstrap for differentially private confidence intervals. In International Conference on Artificial Intelligence and Statistics, pages 1598–1618. PMLR, 2022.
- R. A. Fisher. On the mathematical foundations of theoretical statistics. Philosophical transactions of the Royal Society of London. Series A, containing papers of a mathematical or physical character, 222(594-604):309–368, 1922.
- On the theory and practice of privacy-preserving bayesian data analysis. arXiv preprint arXiv:1603.07294, 2016.
- Dual query: Practical private query release for high dimensional data. In International Conference on Machine Learning, pages 1170–1178. PMLR, 2014.
- R. M. Gower and F. Bach. Properties and examples of convexity and smoothness, February 2019. URL https://gowerrobert.github.io/pdf/M2_statistique_optimisation/exe_convexity_smoothness_solution.pdf.
- Sdnist: Benchmark data and evaluation tools for data sythesizers, Dec. 2021.
- A simple and practical algorithm for differentially private data release. Advances in neural information processing systems, 25, 2012.
- Boosting the accuracy of differentially-private histograms through consistency. arXiv preprint arXiv:0904.0942, 2009.
- Pass-glm: polynomial approximate sufficient statistics for scalable bayesian glm inference. Advances in Neural Information Processing Systems, 30, 2017.
- P. Jain and A. Thakurta. Differentially private learning with kernels. In International conference on machine learning, pages 118–126. PMLR, 2013.
- Pate-gan: Generating synthetic data with differential privacy guarantees. In International conference on learning representations, 2019.
- Private convex empirical risk minimization and high-dimensional regression. In Conference on Learning Theory, pages 25–1. JMLR Workshop and Conference Proceedings, 2012.
- Differentially private bayesian inference for generalized linear models. In International Conference on Machine Learning, pages 5838–5849. PMLR, 2021.
- C. Li and G. Miklau. An adaptive mechanism for accurate query answering under differential privacy. arXiv preprint arXiv:1202.3807, 2012.
- Optimizing linear counting queries under differential privacy. In Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 123–134, 2010.
- A data-and workload-aware algorithm for range queries under differential privacy. arXiv preprint arXiv:1410.0265, 2014.
- The matrix mechanism: optimizing linear counting queries under differential privacy. The VLDB journal, 24:757–781, 2015.
- Iterative methods for private synthetic data: Unifying framework and new methods. Advances in Neural Information Processing Systems, 34:690–702, 2021.
- Graphical-model based estimation and inference for differential privacy. In International Conference on Machine Learning, pages 4435–4444. PMLR, 2019.
- Hdmm: Optimizing error of high-dimensional statistical queries under differential privacy. arXiv preprint arXiv:2106.12118, 2021a.
- Winning the nist contest: A scalable and general approach to differentially private synthetic data. arXiv preprint arXiv:2108.04978, 2021b.
- Aim: An adaptive and iterative mechanism for differentially private synthetic data. arXiv preprint arXiv:2201.12677, 2022.
- F. McSherry and I. Mironov. Differentially private recommender systems: Building privacy into the netflix prize contenders. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 627–636, 2009.
- NIST 2018 Differential Privacy Synthetic Data Challenge. https://www.nist.gov/ctl/pscr/open-innovation-prize-challenges/past-prize-challenges/2018-differential-privacy-synthetic, 2018.
- NIST 2020 Differential Privacy Temporal Map Challenge. https://www.nist.gov/ctl/pscr/open-innovation-prize-challenges/past-prize-challenges/2020-differential-privacy-temporal, 2020.
- Priview: practical differentially private release of marginal contingency tables. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 1435–1446, 2014.
- Challenge design and lessons learned from the 2018 differential privacy challenges. NIST technical note, 2151, 2021.
- Differentially private synthetic data: Applied evaluations and enhancements. arXiv preprint arXiv:2011.05537, 2020.
- O. Sheffet. Differentially private ordinary least squares. In International Conference on Machine Learning, pages 3105–3114. PMLR, 2017.
- Benchmarking differentially private synthetic data generation algorithms. arXiv preprint arXiv:2112.09238, 2021.
- New oracle-efficient algorithms for private synthetic data release. In International Conference on Machine Learning, pages 9765–9774. PMLR, 2020.
- Private synthetic data for multitask learning and marginal queries. arXiv preprint arXiv:2209.07400, 2022.
- D. Vu and A. Slavkovic. Differential privacy for clinical trial data: Preliminary evaluations. In 2009 IEEE International Conference on Data Mining Workshops, pages 138–143. IEEE, 2009.
- Y.-X. Wang. Revisiting differentially private linear regression: optimal and adaptive prediction & estimation in unbounded domain. arXiv preprint arXiv:1803.02596, 2018.
- Privacy for free: Posterior sampling and stochastic gradient monte carlo. In International Conference on Machine Learning, pages 2493–2502. PMLR, 2015.
- Dpcube: Differentially private histogram release through multidimensional partitioning. arXiv preprint arXiv:1202.5358, 2012.
- Differentially private generative adversarial network. arXiv preprint arXiv:1802.06739, 2018.
- Differentially private histogram publication. The VLDB journal, 22:797–822, 2013.
- Accurate and efficient private release of datacubes and contingency tables. In 2013 IEEE 29th International Conference on Data Engineering (ICDE), pages 745–756. IEEE, 2013.
- Privbayes: Private data release via bayesian networks. ACM Transactions on Database Systems (TODS), 42(4):1–41, 2017.
- Towards accurate histogram publication under differential privacy. In Proceedings of the 2014 SIAM international conference on data mining, pages 587–595. SIAM, 2014.
- On the differential privacy of bayesian inference. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30,1, 2016.
- Bounding the excess risk for linear models trained on marginal-preserving, differentially-private, synthetic data. arXiv preprint arXiv:2402.04375, 2024.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.