Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Controlling Privacy Loss in Sampling Schemes: an Analysis of Stratified and Cluster Sampling (2007.12674v2)

Published 24 Jul 2020 in stat.ME, cs.CR, and cs.LG

Abstract: Sampling schemes are fundamental tools in statistics, survey design, and algorithm design. A fundamental result in differential privacy is that a differentially private mechanism run on a simple random sample of a population provides stronger privacy guarantees than the same algorithm run on the entire population. However, in practice, sampling designs are often more complex than the simple, data-independent sampling schemes that are addressed in prior work. In this work, we extend the study of privacy amplification results to more complex, data-dependent sampling schemes. We find that not only do these sampling schemes often fail to amplify privacy, they can actually result in privacy degradation. We analyze the privacy implications of the pervasive cluster sampling and stratified sampling paradigms, as well as provide some insight into the study of more general sampling designs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016 (2016), E. R. Weippl, S. Katzenbeisser, C. Kruegel, A. C. Myers, and S. Halevi, Eds., ACM, pp. 308–318.
  2. Faster boosting with smaller memory. In Advances in Neural Information Processing Systems (2019), H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32, Curran Associates, Inc.
  3. Heterogeneous differential privacy. Journal of Privacy and Confidentiality 7 (04 2015).
  4. Near instance-optimality in differential privacy, 2020.
  5. Privacy amplification by subsampling: Tight analyses via couplings and divergences. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada (2018), pp. 6280–6290.
  6. Privacy profiles and amplification by subsampling. Journal of Privacy and Confidentiality 10, 1 (Jan. 2020).
  7. Private empirical risk minimization: Efficient algorithms and tight error bounds. In Proceedings of the 55th Annual IEEE Symposium on Foundations of Computer Science (Washington, DC, USA, 2014), FOCS ’14, IEEE Computer Society, pp. 464–473.
  8. SAQE: practical privacy-preserving approximate query processing for data federations. Proc. VLDB Endow. 13, 11 (2020), 2691–2705.
  9. Bounds on the sample complexity for private learning and private data release. In Theory of Cryptography, 7th Theory of Cryptography Conference, TCC 2010, Zurich, Switzerland, February 9-11, 2010. Proceedings (2010), D. Micciancio, Ed., vol. 5978 of Lecture Notes in Computer Science, Springer, pp. 437–454.
  10. Characterizing the sample complexity of private learners. In Innovations in Theoretical Computer Science, ITCS ’13, Berkeley, CA, USA, January 9-12, 2013 (2013), R. D. Kleinberg, Ed., ACM, pp. 97–110.
  11. Differentially private release and learning of threshold functions. In Proceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science (Washington, DC, USA, 2015), FOCS ’15, IEEE Computer Society, pp. 634–649.
  12. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Conference on Theory of Cryptography (Berlin, Heidelberg, 2006), TCC ’06, Springer, pp. 265–284.
  13. Sampling and partitioning for differential privacy. In 14th Annual Conference on Privacy, Security and Trust, PST 2016, Auckland, New Zealand, December 12-14, 2016 (2016), IEEE, pp. 664–673.
  14. Differential privacy: Now it’s getting personal. SIGPLAN Not. 50, 1 (Jan. 2015), 69–81.
  15. Imputing missing values in the us census bureau’s county business patterns. Tech. rep., National Bureau of Economic Research, 2021.
  16. Clustered sampling: Low-variance and improved representativity for clients selection in federated learning. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event (2021), M. Meila and T. Zhang, Eds., vol. 139 of Proceedings of Machine Learning Research, PMLR, pp. 3407–3416.
  17. A programming framework for opendp. Manuscript (2020).
  18. Psi (ψ𝜓\psiitalic_ψ): A private data sharing interface. arXiv preprint arXiv:1609.04340 (2016).
  19. Shuffled model of differential privacy in federated learning. In The 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021, April 13-15, 2021, Virtual Event (2021), A. Banerjee and K. Fukumizu, Eds., vol. 130 of Proceedings of Machine Learning Research, PMLR, pp. 2521–2529.
  20. Privacy amplification via bernoulli sampling. arXiv preprint arXiv:2105.10594 (2021).
  21. Differentially private variational inference for non-conjugate models. In Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, UAI 2017, Sydney, Australia, August 11-15, 2017 (2017), G. Elidan, K. Kersting, and A. T. Ihler, Eds., AUAI Press.
  22. Conservative or liberal? personalized differential privacy. In 2015 IEEE 31st International Conference on Data Engineering (2015), pp. 1023–1034.
  23. What can we learn privately? SIAM Journal on Computing 40, 3 (2011), 793–826.
  24. Estimating gradients for discrete random variables by sampling without replacement. In International Conference on Learning Representations (2020).
  25. Predicting optimal solution costs with bidirectional stratified sampling in regular search spaces. Artif. Intell. 230 (2016), 51–73.
  26. Neyman, J. On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society 97, 4 (1934), 558–606.
  27. Active learning using pre-clustering. In Machine Learning, Proceedings of the Twenty-first International Conference (ICML 2004), Banff, Alberta, Canada, July 4-8, 2004 (2004), C. E. Brodley, Ed., vol. 69 of ACM International Conference Proceeding Series, ACM.
  28. Variational bayes in private settings (VIPS). J. Artif. Intell. Res. 68 (2020), 109–157.
  29. Smith, A. Differential privacy and the secrecy of the sample, Feb 2010.
  30. Ullman, J. Lecture notes for cs788: Rigorous approaches to data privacy, 2017.
  31. Vadhan, S. The complexity of differential privacy. In Tutorials on the Foundations of Cryptography: Dedicated to Oded Goldreich, Y. Lindell, Ed. Springer International Publishing AG, Cham, Switzerland, 2017, ch. 7, pp. 347–450.
  32. Subsampled renyi differential privacy and analytical moments accountant. In The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019, 16-18 April 2019, Naha, Okinawa, Japan (2019), vol. 89 of Proceedings of Machine Learning Research, PMLR, pp. 1226–1235.
  33. Privacy for free: Posterior sampling and stochastic gradient monte carlo. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015 (2015), F. R. Bach and D. M. Blei, Eds., vol. 37 of JMLR Workshop and Conference Proceedings, JMLR.org, pp. 2493–2502.
  34. Learning with differential privacy: Stability, learnability and the sufficiency and necessity of ERM principle. J. Mach. Learn. Res. 17 (2016), 183:1–183:40.
  35. A minimax theory for adaptive data analysis. arXiv preprint arXiv:1602.04287 (2016).
  36. A statistical framework for differential privacy. Journal of the American Statistical Association 105, 489 (2010), 375–389.
Citations (6)

Summary

We haven't generated a summary for this paper yet.