Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generalization in the Face of Adaptivity: A Bayesian Perspective (2106.10761v3)

Published 20 Jun 2021 in cs.LG and stat.ML

Abstract: Repeated use of a data sample via adaptively chosen queries can rapidly lead to overfitting, wherein the empirical evaluation of queries on the sample significantly deviates from their mean with respect to the underlying data distribution. It turns out that simple noise addition algorithms suffice to prevent this issue, and differential privacy-based analysis of these algorithms shows that they can handle an asymptotically optimal number of queries. However, differential privacy's worst-case nature entails scaling such noise to the range of the queries even for highly-concentrated queries, or introducing more complex algorithms. In this paper, we prove that straightforward noise-addition algorithms already provide variance-dependent guarantees that also extend to unbounded queries. This improvement stems from a novel characterization that illuminates the core problem of adaptive data analysis. We show that the harm of adaptivity results from the covariance between the new query and a Bayes factor-based measure of how much information about the data sample was encoded in the responses given to past queries. We then leverage this characterization to introduce a new data-dependent stability notion that can bound this covariance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Private and polynomial time algorithms for learning gaussians and beyond. In Conference on Learning Theory, pages 1075–1076. PMLR, 2022.
  2. Typical stability. arXiv preprint arXiv:1604.03336, 2016.
  3. Coupled-worlds privacy: Exploiting adversarial uncertainty in statistical data privacy. In IEEE Symposium on Foundations of Computer Science, pages 439–448, 2013.
  4. Algorithmic stability for adaptive data analysis. SIAM Journal on Computing, 2021.
  5. Noiseless database privacy. In International Conference on the Theory and Application of Cryptology and Information Security, pages 215–232. Springer, 2011.
  6. Guy Blanc. Subsampling suffices for adaptive data analysis. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 999–1012, 2023.
  7. Stability and generalization. The Journal of Machine Learning Research, 2:499–526, 2002.
  8. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pages 635–658. Springer, 2016.
  9. Broadening the scope of differential privacy using metrics. In Privacy Enhancing Technologies: 13th International Symposium, PETS 2013, Bloomington, IN, USA, July 10-12, 2013. Proceedings 13, pages 82–102. Springer, 2013.
  10. Improved generalization guarantees in restricted data models. In 3rd Symposium on Foundations of Responsible Computing, 2022.
  11. A fast algorithm for adaptive private mean estimation. ArXiv, abs/2301.07078, 2023.
  12. Concentrated differential privacy. arXiv preprint arXiv:1603.01887, 2016.
  13. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, pages 265–284. Springer, 2006.
  14. Preserving statistical validity in adaptive data analysis. In ACM Symposium on Theory of Computing, pages 117–126, 2015.
  15. Differential privacy: Now it’s getting personal. Acm Sigplan Notices, 50(1):69–81, 2015.
  16. Sam Elder. Challenges in bayesian adaptive data analysis. arXiv preprint arXiv:1604.02492, 2016.
  17. Generalization for adaptively-chosen estimators via stable median. In Conference on Learning Theory, pages 728–757. PMLR, 2017.
  18. Calibrating noise to variance in adaptive data analysis. In Conference On Learning Theory, pages 535–544. PMLR, 2018.
  19. Individual privacy accounting via a Renyi filter. Advances in Neural Information Processing Systems, 34:28080–28091, 2021.
  20. Crowd-blending privacy. In Annual Cryptology Conference, pages 479–496. Springer, 2012.
  21. The statistical crisis in science. American Scientist, 102(6):460, 2014.
  22. Selling privacy at auction. In ACM Conference on Electronic Commerce, pages 199–208, 2011.
  23. Preventing false discovery in interactive data analysis is hard. In IEEE Symposium on Foundations of Computer Science, pages 454–463, 2014.
  24. Balancing utility and scalability in metric differential privacy. In Uncertainty in Artificial Intelligence, pages 885–894. PMLR, 2022.
  25. John PA Ioannidis. Why most published research findings are false. PLoS medicine, 2(8):e124, 2005.
  26. A new analysis of differential privacy’s generalization guarantees. In Innovations in Theoretical Computer Science (ITCS), 2020.
  27. Finite Sample Differentially Private Confidence Intervals. In Innovations in Theoretical Computer Science Conference, volume 94 of Leibniz International Proceedings in Informatics (LIPIcs), pages 44:1–44:9, 2018.
  28. On the ‘semantics’ of differential privacy: A Bayesian formulation. Journal of Privacy and Confidentiality, 6(1), 2014.
  29. Adaptive data analysis with correlated observations. In International Conference on Machine Learning, pages 11483–11498. PMLR, 2022.
  30. Membership privacy: A unifying framework for privacy definitions. In ACM SIGSAC Conference on Computer & Communications Security, pages 889–900, 2013.
  31. Accuracy first: Selecting a differential privacy level for accuracy constrained erm. Advances in Neural Information Processing Systems, 30, 2017.
  32. Data publishing against realistic adversaries. Proceedings of the VLDB Endowment, 2(1):790–801, 2009.
  33. Ilya Mironov. Rényi differential privacy. In IEEE Computer Security Foundations Symposium (CSF), pages 263–275, 2017.
  34. Smooth sensitivity and sampling in private data analysis. In ACM Symposium on Theory of Computing, pages 75–84, 2007.
  35. Information Theory: From Coding to Learning (book draft). Cambridge university press, 2022.
  36. Privacy odometers and filters: Pay-as-you-go composition. Advances in Neural Information Processing Systems, 29:1921–1929, 2016.
  37. Controlling bias in adaptive data analysis using information theory. In Artificial Intelligence and Statistics, pages 1232–1240, 2016.
  38. Learnability, stability and uniform convergence. The Journal of Machine Learning Research, 11:2635–2670, 2010.
  39. A necessary and sufficient stability notion for adaptive generalization. Advances in Neural Information Processing Systems, 32:11485–11494, 2019.
  40. Interactive fingerprinting codes and the hardness of preventing false discovery. In Conference on Learning Theory, pages 1588–1628. PMLR, 2015.
  41. Reasoning about generalization via conditional mutual information. In Conference on Learning Theory, pages 3437–3452. PMLR, 2020.
  42. Bayesian differential privacy for machine learning. In International Conference on Machine Learning, pages 9583–9592. PMLR, 2020.
  43. Rényi divergence and kullback-leibler divergence. IEEE Transactions on Information Theory, 60(7):3797–3820, 2014.
  44. Pseudo-maximization and self-normalized processes. Probability Surveys, 4:172–192, 2007.
  45. Yu-Xiang Wang. Per-instance differential privacy. Journal of Privacy and Confidentiality, 9(1), 2019.
  46. Fully-adaptive composition in differential privacy. In International Conference on Machine Learning, pages 36990–37007. PMLR, 2023.
  47. Bayesian differential privacy on correlated data. In Proceedings of the 2015 ACM SIGMOD international conference on Management of Data, pages 747–762, 2015.
  48. Optimal accounting of differential privacy via characteristic function. In International Conference on Artificial Intelligence and Statistics, pages 4782–4817. PMLR, 2022.
  49. Natural analysts in adaptive data analysis. In International Conference on Machine Learning, pages 7703–7711. PMLR, 2019.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com