Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Query Complexity of Training Data Reconstruction in Private Learning (2303.16372v6)

Published 29 Mar 2023 in cs.LG, cs.CR, and stat.ML

Abstract: We analyze the number of queries that a whitebox adversary needs to make to a private learner in order to reconstruct its training data. For $(\epsilon, \delta)$ DP learners with training data drawn from any arbitrary compact metric space, we provide the \emph{first known lower bounds on the adversary's query complexity} as a function of the learner's privacy parameters. \emph{Our results are minimax optimal for every $\epsilon \geq 0, \delta \in [0, 1]$, covering both $\epsilon$-DP and $(0, \delta)$ DP as corollaries}. Beyond this, we obtain query complexity lower bounds for $(\alpha, \epsilon)$ R\'enyi DP learners that are valid for any $\alpha > 1, \epsilon \geq 0$. Finally, we analyze data reconstruction attacks on locally compact metric spaces via the framework of Metric DP, a generalization of DP that accounts for the underlying metric structure of the data. In this setting, we provide the first known analysis of data reconstruction in unbounded, high dimensional spaces and obtain query complexity lower bounds that are nearly tight modulo logarithmic factors.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS ’16, page 308–318, New York, NY, USA, 2016. Association for Computing Machinery. ISBN 9781450341394. 10.1145/2976749.2978318. URL https://doi.org/10.1145/2976749.2978318.
  2. Moment estimates derived from poincaré and logarithmic sobolev inequalities. Mathematical Research Letters, 1(1):75–86, 1994.
  3. Differentially private simple linear regression, 2020. URL https://arxiv.org/abs/2007.05157.
  4. Tracking dp budget while handling basic sql queries. URL https://github.com/opendp/smartnoise-sdk/blob/main/ sql/docs/papers/DP_SQL_budget.pdf.
  5. Differential Privacy Team Apple. Learning with privacy at scale. URL https://machinelearning.apple.com/research/learning-with-privacy-at-scale.
  6. Wiki Article. Binomial Distribution. URL https://en.wikipedia.org/wiki/Binomial_distribution#Tail_bounds.
  7. Reconstructing training data with informed adversaries, 2022. URL https://arxiv.org/abs/2201.04845.
  8. Poincaré’s inequalities and talagrand’s concentration phenomenon for the exponential distribution. Probability Theory and Related Fields, 107:383–400, 1997.
  9. Sergey G Bobkov. Spectral gap and concentration for some spherically symmetric probability measures. In Geometric Aspects of Functional Analysis: Israel Seminar 2001-2002, pages 37–43. Springer, 2003.
  10. Private measures, random walks, and synthetic data, 2022.
  11. A corrective view of neural networks: Representation, memorization and learning, 2020. URL https://arxiv.org/abs/2002.00274.
  12. Language models are few-shot learners, 2020. URL https://arxiv.org/abs/2005.14165.
  13. Network size and size of the weights in memorization with two-layers neural networks. Advances in Neural Information Processing Systems, 33:4977–4986, 2020.
  14. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pages 635–658. Springer, 2016.
  15. Lucien Le Cam. Asymptotic methods in statistical decision theory. 1986. URL https://api.semanticscholar.org/CorpusID:118432452.
  16. Extracting training data from large language models, 2021.
  17. Broadening the scope of differential privacy using metrics. 07 2013. ISBN 978-3-642-39076-0. 10.1007/978-3-642-39077-7_5.
  18. Privacy-preserving logistic regression. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems, volume 21. Curran Associates, Inc., 2008. URL https://proceedings.neurips.cc/paper/2008/file/8065d07da4a77621450aa84fee5656d9-Paper.pdf.
  19. Differentially private empirical risk minimization, 2009. URL https://arxiv.org/abs/0912.0071.
  20. Revealing information while preserving privacy. In Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’03, page 202–210, New York, NY, USA, 2003. Association for Computing Machinery. ISBN 1581136706. 10.1145/773153.773173. URL https://doi.org/10.1145/773153.773173.
  21. Cynthia Dwork. Differential Privacy, pages 338–340. Springer US, Boston, MA, 2011. ISBN 978-1-4419-5906-5. 10.1007/978-1-4419-5906-5_752. URL https://doi.org/10.1007/978-1-4419-5906-5_752.
  22. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3–4):211–407, aug 2014. ISSN 1551-305X. 10.1561/0400000042. URL https://doi.org/10.1561/0400000042.
  23. Concentrated differential privacy, 2016.
  24. Our data, ourselves: Privacy via distributed noise generation. In Advances in Cryptology (EUROCRYPT 2006), volume 4004 of Lecture Notes in Computer Science, pages 486–503. Springer Verlag, May 2006. URL https://www.microsoft.com/en-us/research/publication/our-data-ourselves-privacy-via-distributed-noise-generation/.
  25. Privacy amplification by iteration. pages 521–532, 10 2018. 10.1109/FOCS.2018.00056.
  26. Generalised differential privacy for text document processing. In Flemming Nielson and David Sands, editors, Principles of Security and Trust, pages 123–148, Cham, 2019. Springer International Publishing. ISBN 978-3-030-17138-4.
  27. The laplace mechanism has optimal utility for differential privacy over continuous queries, 2021. URL https://arxiv.org/abs/2105.07176.
  28. Sa-dpsgd: Differentially private stochastic gradient descent based on simulated annealing, 2022. URL https://arxiv.org/abs/2211.07218.
  29. Scalar poincaré implies matrix poincaré, 2020.
  30. Property testing and its connection to learning and approximation. Journal of the ACM (JACM), 45(4):653–750, 1998.
  31. Bounding training data reconstruction in private (deep) learning. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 8056–8071. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/guo22c.html.
  32. De Huang and Joel A Tropp. From poincaré inequalities to nonlinear matrix concentration. 2021.
  33. Differentially private learning does not bound membership inference. ArXiv, abs/2010.12112, 2020.
  34. Balancing utility and scalability in metric differential privacy. In UAI 2022, 2022. URL https://www.amazon.science/publications/balancing-utility-and- scalability-in-metric-differential- privacy.
  35. Differentially private online learning, 2011. URL https://arxiv.org/abs/1109.0105.
  36. The composition theorem for differential privacy. In International conference on machine learning, pages 1376–1385. PMLR, 2015.
  37. An introduction to computational learning theory. MIT press, 1994.
  38. Gradual release of sensitive data under differential privacy. Journal of Privacy and Confidentiality, 7(2), Jan. 2017. 10.29012/jpc.v7i2.649. URL https://journalprivacyconfidentiality.org/index.php/jpc/article/view/649.
  39. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. 10.1109/5.726791.
  40. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017.
  41. Optimal membership inference bounds for adaptive composition of sampled gaussian mechanisms, 2022.
  42. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), pages 94–103. IEEE, 2007.
  43. Ilya Mironov. Rényi differential privacy. pages 263–275, 08 2017. 10.1109/CSF.2017.11.
  44. The complexity of computing the optimal composition of differential privacy. In Theory of Cryptography Conference, pages 157–175. Springer, 2015.
  45. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  46. Walter Rudin. Principles of mathematical analysis, volume 3. McGraw-hill New York, 1976.
  47. Sok: Let the privacy games begin! a unified treatment of data inference privacy in machine learning, 2023.
  48. Reza Shokri. Auditing data privacy for machine learning. Santa Clara, CA, February 2022. USENIX Association.
  49. Stochastic gradient descent with differentially private updates. In 2013 IEEE Global Conference on Signal and Information Processing, pages 245–248, 2013. 10.1109/GlobalSIP.2013.6736861.
  50. Defending against reconstruction attacks with rényi differential privacy, 2022.
  51. Differentially private learning needs better features (or much more data), 2020. URL https://arxiv.org/abs/2011.11660.
  52. Differentially private learning needs better features (or much more data), 2021.
  53. Alexandre B Tsybakov. Introduction to nonparametric estimation, 2009. URL https://doi. org/10.1007/b13794. Revised and extended from the, 9(10), 2004.
  54. Salil Vadhan. The Complexity of Differential Privacy, pages 347–450. Springer, Yehuda Lindell, ed., 2017. URL https://link.springer.com/chapter/10.1007/978-3-319-57048-8_7.
  55. On the optimal memorization power of relu neural networks. arXiv preprint arXiv:2110.03187, 2021.
  56. Theory of games and economic behavior, 2nd rev. 1947.
  57. Martin J. Wainwright. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2019. 10.1017/9781108627771.
  58. Stanley L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69, 1965. ISSN 01621459. URL http://www.jstor.org/stable/2283137.
  59. Privacy risk in machine learning: Analyzing the connection to overfitting. pages 268–282, 07 2018. 10.1109/CSF.2018.00027.
  60. Deep leakage from gradients, 2019.

Summary

We haven't generated a summary for this paper yet.