Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Statistical Complexity of Estimation and Testing under Privacy Constraints (2210.02215v3)

Published 5 Oct 2022 in cs.LG and cs.CR

Abstract: The challenge of producing accurate statistics while respecting the privacy of the individuals in a sample is an important area of research. We study minimax lower bounds for classes of differentially private estimators. In particular, we show how to characterize the power of a statistical test under differential privacy in a plug-and-play fashion by solving an appropriate transport problem. With specific coupling constructions, this observation allows us to derive Le Cam-type and Fano-type inequalities not only for regular definitions of differential privacy but also for those based on Renyi divergence. We then proceed to illustrate our results on three simple, fully worked out examples. In particular, we show that the problem class has a huge importance on the provable degradation of utility due to privacy. In certain scenarios, we show that maintaining privacy results in a noticeable reduction in performance only when the level of privacy protection is very high. Conversely, for other problems, even a modest level of privacy protection can lead to a significant decrease in performance. Finally, we demonstrate that the DP-SGLD algorithm, a private convex solver, can be employed for maximum likelihood estimation with a high degree of confidence, as it provides near-optimal results with respect to both the size of the sample and the level of privacy protection. This algorithm is applicable to a broad range of parametric estimation procedures, including exponential families.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (79)
  1. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp.  308–318, 2016.
  2. John M Abowd. The us census bureau adopts differential privacy. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp.  2867–2867, 2018.
  3. Differentially private testing of identity and closeness of discrete distributions. Advances in Neural Information Processing Systems, 31, 2018.
  4. Inference under information constraints III: local privacy constraints. IEEE J. Sel. Areas Inf. Theory, 2(1):253–267, 2021a.
  5. Information-constrained optimization: can adaptive processing of gradients help? CoRR, abs/2104.00979, 2021b.
  6. Optimal rates for nonparametric density estimation under communication constraints. CoRR, abs/2107.10078, 2021c.
  7. Unified lower bounds for interactive high-dimensional estimation under information constraints. CoRR, abs/2010.06562, 2021d.
  8. Differentially private assouad, fano, and le cam. In Algorithmic Learning Theory, pp.  48–78. PMLR, 2021e.
  9. Pairwise optimal coupling of multiple random variables. arXiv preprint arXiv:1903.00632, 2019.
  10. Near instance-optimality in differential privacy. arXiv preprint arXiv:2005.10630, 2020a.
  11. Instance-optimality in differential privacy via approximate inverse sensitivity mechanisms. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  14106–14117. Curran Associates, Inc., 2020b.
  12. Patrice Assouad. Deux remarques sur l’estimation. Comptes rendus des séances de l’Académie des sciences. Série 1, Mathématique, 296(23):1021–1024, 1983.
  13. Wherefore art thou r3579x? anonymized social networks, hidden patterns, and structural steganography. In Proceedings of the 16th international conference on World Wide Web, pp.  181–190, 2007.
  14. Fisher information for distributed estimation under a blackboard communication protocol. In ISIT, pp.  2704–2708. IEEE, 2019.
  15. Fisher information under local differential privacy. IEEE J. Sel. Areas Inf. Theory, 1(3):645–659, 2020a.
  16. Lower bounds for learning distributions under communication constraints via fisher information. J. Mach. Learn. Res., 21:Paper No. 236, 30, 2020b. ISSN 1532-4435.
  17. Björn Bebensee. Local differential privacy: a tutorial. arXiv preprint arXiv:1907.11908, 2019.
  18. Amir Beck. First-order methods in optimization. SIAM, 2017.
  19. Archimedes meets privacy: On privately estimating quantiles in high dimensions under minimal assumptions. arXiv preprint arXiv:2208.07438, 2022.
  20. Estimating integrated squared density derivatives: sharp best order of convergence estimates. Sankhyā: The Indian Journal of Statistics, Series A, pp. 381–393, 1988.
  21. Coinpress: Practical private mean and covariance estimation. Advances in Neural Information Processing Systems, 33:14475–14485, 2020.
  22. Cours de statistiques, ens ulm, fimfa ens. 2019.
  23. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pp.  635–658. Springer, 2016.
  24. Average-case averages: Private algorithms for smooth sensitivity and mean estimation. Advances in Neural Information Processing Systems, 32, 2019.
  25. Private hypothesis selection. Advances in Neural Information Processing Systems, 32, 2019.
  26. Differentially private empirical risk minimization. Journal of Machine Learning Research, 12(3), 2011.
  27. Differential privacy dynamics of langevin diffusion and noisy gradient descent. Advances in Neural Information Processing Systems, 34:14771–14781, 2021.
  28. Privacy at scale: Local differential privacy in practice. In Proceedings of the 2018 International Conference on Management of Data, pp.  1655–1658, 2018.
  29. Luc Devroye. A course in density estimation. Birkhauser Boston Inc., 1987.
  30. Differentially private learning of structured discrete distributions. Advances in Neural Information Processing Systems, 28, 2015.
  31. Collecting telemetry data privately. arXiv preprint arXiv:1712.01524, 2017.
  32. Revealing information while preserving privacy. In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp.  202–210, 2003.
  33. Gaussian differential privacy. arXiv preprint arXiv:1905.02383, 2019.
  34. Optimal differential privacy composition for exponential mechanisms. In International Conference on Machine Learning, pp. 2597–2606. PMLR, 2020.
  35. Differentially private confidence intervals. arXiv preprint arXiv:2001.02285, 2020.
  36. Local privacy and statistical minimax rates. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pp.  429–438. IEEE, 2013a.
  37. Local privacy, data processing inequalities, and minimax rates. arXiv preprint arXiv:1302.3203, 2013b.
  38. Differential privacy and robust statistics. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pp.  371–380, 2009.
  39. Concentrated differential privacy. arXiv preprint arXiv:1603.01887, 2016.
  40. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pp.  265–284. Springer, 2006.
  41. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211–407, 2014.
  42. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pp.  1054–1067, 2014.
  43. Differentially private densest subgraph. In International Conference on Artificial Intelligence and Statistics, pp.  11581–11597. PMLR, 2022.
  44. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp.  1322–1333, 2015.
  45. Christophe Giraud. Introduction to high-dimensional statistics. Chapman and Hall/CRC, 2021.
  46. A distribution-free theory of nonparametric regression, volume 1. Springer, 2002.
  47. Random differential privacy. arXiv preprint arXiv:1112.2680, 2011.
  48. Resolving individuals contributing trace amounts of dna to highly complex mixtures using high-density snp genotyping microarrays. PLoS Genet, 4(8):e1000167, 2008.
  49. Statistical estimation: asymptotic theory, volume 16. Springer Science & Business Media, 2013.
  50. The composition theorem for differential privacy. In International conference on machine learning, pp. 1376–1385. PMLR, 2015.
  51. Improved rates for differentially private stochastic convex optimization with heavy-tailed data. In International Conference on Machine Learning, pp. 10633–10660. PMLR, 2022.
  52. Finite sample differentially private confidence intervals. arXiv preprint arXiv:1711.03908, 2017.
  53. Private quantiles estimation in the presence of atoms. arXiv preprint arXiv:2202.08969, 2022.
  54. Torgny Lindvall. Lectures on the coupling method. Courier Corporation, 2002.
  55. The disclosure of diagnosis codes can breach research participants’ privacy. Journal of the American Medical Informatics Association, 17(3):322–327, 2010.
  56. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), pp.  94–103. IEEE, 2007.
  57. Frank D McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp.  19–30, 2009.
  58. Ilya Mironov. Rényi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pp.  263–275. IEEE, 2017.
  59. How to break anonymity of the netflix prize dataset. arXiv preprint cs/0610105, 2006.
  60. Robust de-anonymization of large sparse datasets. In 2008 IEEE Symposium on Security and Privacy (sp 2008), pp. 111–125. IEEE, 2008.
  61. Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.
  62. High dimensional statistics. Lecture notes for course 18S997, 813(814):46, 2015.
  63. Learning in a large function space: Privacy-preserving mechanisms for svm learning. arXiv preprint arXiv:0911.5708, 2009.
  64. Differential privacy guarantees for stochastic gradient langevin dynamics. arXiv preprint arXiv:2201.11980, 2022.
  65. Filippo Santambrogio. Optimal transport for applied mathematicians. Birkäuser, NY, 55(58-63):94, 2015.
  66. An introductory guide to fano’s inequality with applications in statistical estimation. arXiv preprint arXiv:1901.00555, 2019.
  67. Adam Smith. Privacy-preserving statistical estimation with optimal convergence rates. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pp.  813–822, 2011.
  68. Latanya Sweeney. Simple demographics often identify people uniquely. Health (San Francisco), 671(2000):1–34, 2000.
  69. Latanya Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05):557–570, 2002.
  70. Optimal rates of (locally) differentially private heavy-tailed multi-armed bandits. In International Conference on Artificial Intelligence and Statistics, pp.  1546–1574. PMLR, 2022.
  71. Learning new words. Granted US Patents, 9594741, 2017.
  72. Elements of information theory. Wiley-Interscience, 2006.
  73. Alexandre B Tsybakov. Introduction à l’estimation non paramétrique, volume 41. Springer Science & Business Media, 2003.
  74. Aad W Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000.
  75. Rényi divergence and kullback-leibler divergence. IEEE Transactions on Information Theory, 60(7):3797–3820, 2014.
  76. Sergio Verdú et al. Generalizing the fano inequality. IEEE Transactions on Information Theory, 40(4):1247–1251, 1994.
  77. Technical privacy metrics: a systematic survey. ACM Computing Surveys (CSUR), 51(3):1–38, 2018.
  78. A statistical framework for differential privacy. Journal of the American Statistical Association, 105(489):375–389, 2010.
  79. Local differential privacy and its applications: A comprehensive survey. arXiv preprint arXiv:2008.03686, 2020.
Citations (7)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com