Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sample-Optimal Locally Private Hypothesis Selection and the Provable Benefits of Interactivity (2312.05645v1)

Published 9 Dec 2023 in stat.ML, cs.CR, cs.IT, cs.LG, and math.IT

Abstract: We study the problem of hypothesis selection under the constraint of local differential privacy. Given a class $\mathcal{F}$ of $k$ distributions and a set of i.i.d. samples from an unknown distribution $h$, the goal of hypothesis selection is to pick a distribution $\hat{f}$ whose total variation distance to $h$ is comparable with the best distribution in $\mathcal{F}$ (with high probability). We devise an $\varepsilon$-locally-differentially-private ($\varepsilon$-LDP) algorithm that uses $\Theta\left(\frac{k}{\alpha2\min {\varepsilon2,1}}\right)$ samples to guarantee that $d_{TV}(h,\hat{f})\leq \alpha + 9 \min_{f\in \mathcal{F}}d_{TV}(h,f)$ with high probability. This sample complexity is optimal for $\varepsilon<1$, matching the lower bound of Gopi et al. (2020). All previously known algorithms for this problem required $\Omega\left(\frac{k\log k}{\alpha2\min { \varepsilon2 ,1}} \right)$ samples to work. Moreover, our result demonstrates the power of interaction for $\varepsilon$-LDP hypothesis selection. Namely, it breaks the known lower bound of $\Omega\left(\frac{k\log k}{\alpha2\min { \varepsilon2 ,1}} \right)$ for the sample complexity of non-interactive hypothesis selection. Our algorithm breaks this barrier using only $\Theta(\log \log k)$ rounds of interaction. To prove our results, we define the notion of \emph{critical queries} for a Statistical Query Algorithm (SQA) which may be of independent interest. Informally, an SQA is said to use a small number of critical queries if its success relies on the accuracy of only a small number of queries it asks. We then design an LDP algorithm that uses a smaller number of critical queries.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (96)
  1. Ishaq Aden-Ali, Hassan Ashtiani and Gautam Kamath “On the sample complexity of privately learning unbounded high-dimensional gaussians” In Algorithmic Learning Theory, 2021, pp. 185–216 PMLR
  2. Ishaq Aden-Ali, Hassan Ashtiani and Christopher Liaw “Privately learning mixtures of axis-aligned gaussians” In Advances in Neural Information Processing Systems 34, 2021, pp. 3925–3938
  3. Mohammad Afzali, Hassan Ashtiani and Christopher Liaw “Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples” In arXiv preprint arXiv:2309.03847, 2023
  4. Jamil Arbas, Hassan Ashtiani and Christopher Liaw “Polynomial time and private learning of unbounded Gaussian Mixture Models” In International Conference on Machine Learning, 2023 PMLR
  5. “Near-optimal sample complexity bounds for robust learning of gaussian mixtures via compression schemes” In Journal of the ACM (JACM) 67.6 ACM New York, NY, USA, 2020, pp. 1–42
  6. “Inference under information constraints III: Local privacy constraints” In IEEE Journal on Selected Areas in Information Theory 2.1 IEEE, 2021, pp. 253–267
  7. “Test without trust: Optimal locally private distribution testing” In The 22nd International Conference on Artificial Intelligence and Statistics, 2019, pp. 2067–2076 PMLR
  8. “Unified lower bounds for interactive high-dimensional estimation under information constraints” In arXiv preprint arXiv:2010.06562, 2020
  9. “The role of interactivity in structured estimation” In Conference on Learning Theory, 2022, pp. 1328–1355 PMLR
  10. “Fast and near-optimal algorithms for approximating distributions by histograms” In Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 2015, pp. 249–263
  11. “Sample-optimal density estimation in nearly-linear time” In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, 2017, pp. 1278–1289 SIAM
  12. “Maximum selection and sorting with adversarial comparators” In The Journal of Machine Learning Research 19.1 JMLR. org, 2018, pp. 2427–2457
  13. “Fast Optimal Locally Private Mean Estimation via Random Projections” In arXiv preprint arXiv:2306.04444, 2023
  14. Hilal Asi, Vitaly Feldman and Kunal Talwar “Optimal algorithms for mean estimation under local differential privacy” In International Conference on Machine Learning, 2022, pp. 1046–1056 PMLR
  15. “Sorting with adversarial comparators and application to density estimation” In 2014 IEEE International Symposium on Information Theory, 2014, pp. 1682–1686 IEEE
  16. “Privately Estimating a Gaussian: Efficient, Robust, and Optimal” In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, 2023, pp. 483–496
  17. “Private and polynomial time algorithms for learning Gaussians and beyond” In Conference on Learning Theory, 2022, pp. 1075–1076 PMLR
  18. “Some techniques in density estimation” In arXiv preprint arXiv:1801.04003, 2018
  19. “Contraction of Locally Differentially Private Mechanisms” In arXiv preprint arXiv:2210.13386, 2022
  20. “Private Distribution Learning with Public Data: The View from Sample Compression” In arXiv preprint arXiv:2308.06239, 2023
  21. “Statistically near-optimal hypothesis selection” In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), 2022, pp. 909–919 IEEE
  22. “Coinpress: Practical private mean and covariance estimation” In Advances in Neural Information Processing Systems 33, 2020, pp. 14475–14485
  23. “Testing that distributions are close” In Proceedings 41st Annual Symposium on Foundations of Computer Science, 2000, pp. 259–269 IEEE
  24. Lucien Birge “The grenader estimator: A nonasymptotic approach” In The Annals of Statistics JSTOR, 1989, pp. 1532–1549
  25. Olivier Bousquet, Daniel Kane and Shay Moran “The optimal approximation factor in density estimation” In Conference on Learning Theory, 2019, pp. 318–341 PMLR
  26. Alex Bie, Gautam Kamath and Vikrant Singhal “Private estimation with public data” In Advances in Neural Information Processing Systems 35, 2022, pp. 18653–18666
  27. “Private hypothesis selection” In Advances in Neural Information Processing Systems 32, 2019
  28. Mark Bun, Jelani Nelson and Uri Stemmer “Heavy hitters and the structure of local privacy” In ACM Transactions on Algorithms (TALG) 15.4 ACM New York, NY, USA, 2019, pp. 1–40
  29. Clément L. Canonne “A Survey on Distribution Testing: Your Data is Big. But is it Blue?”, Graduate Surveys 9 Theory of Computing Library, 2020, pp. 1–100 DOI: 10.4086/toc.gs.2020.009
  30. “Efficient density estimation via piecewise polynomial approximation” In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, 2014, pp. 604–613
  31. “The structure of optimal private tests for simple hypotheses” In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, 2019, pp. 310–321
  32. “Private identity testing for high-dimensional distributions” In Advances in Neural Information Processing Systems 33, 2020, pp. 10099–10111
  33. “Differentially-private clustering of easy instances” In International Conference on Machine Learning, 2021, pp. 2049–2059 PMLR
  34. Wei-Ning Chen, Peter Kairouz and Ayfer Ozgur “Breaking the communication-privacy-accuracy trilemma” In Advances in Neural Information Processing Systems 33, 2020, pp. 3312–3324
  35. Constantinos Daskalakis, Ilias Diakonikolas and Rocco A Servedio “Learning poisson binomial distributions” In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, 2012, pp. 709–728
  36. “Locally private learning without interaction requires separation” In Advances in neural information processing systems 32, 2019
  37. Ilias Diakonikolas “Learning Structured Distributions.” In Handbook of Big Data 267, 2016, pp. 10–1201
  38. Apple Differential Privacy Team “Learning with Privacy at Scale”, https://machinelearning.apple.com/research/learning-with-privacy-at-scale, December 2017
  39. John C Duchi, Michael I Jordan and Martin J Wainwright “Local privacy and statistical minimax rates” In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, 2013, pp. 429–438 IEEE
  40. John C Duchi, Michael I Jordan and Martin J Wainwright “Minimax optimal procedures for locally private estimation” In Journal of the American Statistical Association 113.521 Taylor & Francis, 2018, pp. 182–201
  41. “Faster and sample near-optimal algorithms for proper learning mixtures of gaussians” In Conference on Learning Theory, 2014, pp. 1183–1213 PMLR
  42. Ilias Diakonikolas and Daniel M Kane “A new approach for testing properties of discrete distributions” In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), 2016, pp. 685–694 IEEE
  43. “A bounded-noise mechanism for differential privacy” In Conference on Learning Theory, 2022, pp. 625–661 PMLR
  44. Ilias Diakonikolas and Daniel M Kane “Algorithmic high-dimensional robust statistics” Cambridge University Press, 2023
  45. “Robust estimators in high-dimensions without the computational intractability” In SIAM Journal on Computing 48.2 SIAM, 2019, pp. 742–864
  46. “Our data, ourselves: Privacy via distributed noise generation” In Advances in Cryptology-EUROCRYPT 2006: 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques, St. Petersburg, Russia, May 28-June 1, 2006. Proceedings 25, 2006, pp. 486–503 Springer
  47. Ilias Diakonikolas, Daniel M Kane and Alistair Stewart “Statistical query lower bounds for robust estimation of high-dimensional gaussians and gaussian mixtures” In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), 2017, pp. 73–84 IEEE
  48. Bolin Ding, Janardhan Kulkarni and Sergey Yekhanin “Collecting telemetry data privately” In Advances in Neural Information Processing Systems 30, 2017
  49. “Combinatorial methods in density estimation” Springer Science & Business Media, 2001
  50. “A universally acceptable smoothing factor for kernel density estimates” In The Annals of Statistics JSTOR, 1996, pp. 2499–2512
  51. “Nonasymptotic universal smoothing factors, kernel complexity and Yatracos classes” In The Annals of Statistics JSTOR, 1997, pp. 2626–2637
  52. “Calibrating noise to sensitivity in private data analysis” In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3, 2006, pp. 265–284 Springer
  53. “The Algorithmic Foundations of Differential Privacy”, 2014 URL: http://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf
  54. “Lower bounds for locally private estimation via communication complexity” In Conference on Learning Theory, 2019, pp. 1161–1191 PMLR
  55. “Learning new words”, VUS Patent 9,645,998, May 9 2017
  56. Alexandre Evfimievski, Johannes Gehrke and Ramakrishnan Srikant “Limiting privacy breaches in privacy preserving data mining” In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, 2003, pp. 211–222
  57. Alexander Edmonds, Aleksandar Nikolov and Jonathan Ullman “The power of factorization mechanisms in local and central differential privacy” In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, 2020, pp. 425–438
  58. Úlfar Erlingsson, Vasyl Pihur and Aleksandra Korolova “Rappor: Randomized aggregatable privacy-preserving ordinal response” In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, 2014, pp. 1054–1067
  59. Vitaly Feldman “A general characterization of the statistical query complexity” In Conference on Learning Theory, 2017, pp. 785–830 PMLR
  60. “Locally private hypothesis selection” In Conference on Learning Theory, 2020, pp. 1785–1816 PMLR
  61. Badih Ghazi, Ravi Kumar and Pasin Manurangsi “On avoiding the union bound when answering multiple differentially private queries” In Conference on Learning Theory, 2021, pp. 2133–2146 PMLR
  62. “On testing expansion in bounded-degree graphs” In Studies in Complexity and Cryptography. Miscellanea on the Interplay between Randomness and Computation: In Collaboration with Lidor Avigad, Mihir Bellare, Zvika Brakerski, Shafi Goldwasser, Shai Halevi, Tali Kaufman, Leonid Levin, Noam Nisan, Dana Ron, Madhu Sudan, Luca Trevisan, Salil Vadhan, Avi Wigderson, David Zuckerman Springer, 2011, pp. 68–75
  63. “On density estimation in the view of Kolmogorov’s ideas in approximation theory” In The Annals of Statistics JSTOR, 1990, pp. 999–1010
  64. “Robustness implies privacy in statistical estimation” In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, 2023, pp. 497–506
  65. Yanjun Han, Ayfer Özgür and Tsachy Weissman “Geometric lower bounds for distributed parameter estimation under communication constraints” In Conference On Learning Theory, 2018, pp. 3163–3188 PMLR
  66. “Locally private gaussian estimation” In Advances in Neural Information Processing Systems 32, 2019
  67. “The role of interactivity in local differential privacy” In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), 2019, pp. 94–105 IEEE
  68. Matthew Joseph, Jieming Mao and Aaron Roth “Exponential separations in local differential privacy” In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, 2020, pp. 515–527 SIAM
  69. Matthew Joseph, Jieming Mao and Aaron Roth “Exponential Separations in Local Privacy” In ACM Trans. Algorithms 18.4 New York, NY, USA: Association for Computing Machinery, 2022
  70. Michael Kearns “Efficient noise-tolerant learning from statistical queries” In Journal of the ACM (JACM) 45.6 ACM New York, NY, USA, 1998, pp. 983–1006
  71. “What can we learn privately?” In SIAM Journal on Computing 40.3 SIAM, 2011, pp. 793–826
  72. “Privately learning high-dimensional distributions” In Conference on Learning Theory, 2019, pp. 1853–1902 PMLR
  73. “Advances and open problems in federated learning” In Foundations and Trends® in Machine Learning 14.1–2 Now Publishers, Inc., 2021, pp. 1–210
  74. “A private and computationally-efficient estimator for unbounded gaussians” In Conference on Learning Theory, 2022, pp. 544–572 PMLR
  75. Pravesh Kothari, Pasin Manurangsi and Ameya Velingker “Private robust estimation by stabilizing convex relaxations” In Conference on Learning Theory, 2022, pp. 723–777 PMLR
  76. “Differentially private algorithms for learning mixtures of separated gaussians” In Advances in Neural Information Processing Systems 32, 2019
  77. Gautam Kamath, Vikrant Singhal and Jonathan Ullman “Private mean estimation of heavy-tailed distributions” In Conference on Learning Theory, 2020, pp. 2204–2235 PMLR
  78. “Finite Sample Differentially Private Confidence Intervals” In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018), 2018 Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik
  79. “Density estimation in linear time” In arXiv preprint arXiv:0712.2869, 2007
  80. “Settling the polynomial learnability of mixtures of gaussians” In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, 2010, pp. 93–102 IEEE
  81. Shyam Narayanan “Private high-dimensional hypothesis testing” In Conference on Learning Theory, 2022, pp. 3979–4027 PMLR
  82. Jerzy Neyman and Egon Sharpe Pearson “IX. On the problem of the most efficient tests of statistical hypotheses” In Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character 231.694-706 The Royal Society London, 1933, pp. 289–337
  83. “Simple Binary Hypothesis Testing under Local Differential Privacy and Communication Constraints” In arXiv preprint arXiv:2301.03566, 2023
  84. Liam Paninski “A coincidence-based test for uniformity given very sparsely sampled discrete data” In IEEE Transactions on Information Theory 54.10 IEEE, 2008, pp. 4750–4755
  85. Ankit Pensia, Po-Ling Loh and Varun Jog “Simple Binary Hypothesis Testing under Communication Constraints” In 2022 IEEE International Symposium on Information Theory (ISIT), 2022, pp. 3297–3302 IEEE
  86. Henry Scheffé “A useful convergence theorem for probability distributions” In The Annals of Mathematical Statistics 18.3 JSTOR, 1947, pp. 434–438
  87. Vikrant Singhal “A Polynomial Time, Pure Differentially Private Estimator for Binary Product Distributions” In arXiv preprint arXiv:2304.06787, 2023
  88. “Near-optimal-sample estimators for spherical gaussian mixtures” In Advances in Neural Information Processing Systems 27, 2014
  89. “Between Pure and Approximate Differential Privacy” In Journal of Privacy and Confidentiality 7.2, 2016
  90. Balázs Szörényi “Characterizing statistical query learning: simplified notions and proofs” In International Conference on Algorithmic Learning Theory, 2009, pp. 186–200 Springer
  91. “Friendlycore: Practical differentially private aggregation” In International Conference on Machine Learning, 2022, pp. 21828–21863 PMLR
  92. Jonathan Ullman “Answering n {{\{{2+ o (1)}}\}} counting queries with differential privacy is hard” In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, 2013, pp. 361–370
  93. Jonathan Ullman “Tight lower bounds for locally differentially private selection” In arXiv preprint arXiv:1802.02638, 2018
  94. Stanley L Warner “Randomized response: A survey technique for eliminating evasive answer bias” In Journal of the American Statistical Association 60.309 Taylor & Francis, 1965, pp. 63–69
  95. Yannis G Yatracos “Rates of convergence of minimum distance estimators and Kolmogorov’s entropy” In The Annals of Statistics 13.2 Institute of Mathematical Statistics, 1985, pp. 768–774
  96. “Information-theoretic determination of minimax rates of convergence” In Annals of Statistics JSTOR, 1999, pp. 1564–1599

Summary

We haven't generated a summary for this paper yet.