Sample-Optimal Locally Private Hypothesis Selection and the Provable Benefits of Interactivity (2312.05645v1)
Abstract: We study the problem of hypothesis selection under the constraint of local differential privacy. Given a class $\mathcal{F}$ of $k$ distributions and a set of i.i.d. samples from an unknown distribution $h$, the goal of hypothesis selection is to pick a distribution $\hat{f}$ whose total variation distance to $h$ is comparable with the best distribution in $\mathcal{F}$ (with high probability). We devise an $\varepsilon$-locally-differentially-private ($\varepsilon$-LDP) algorithm that uses $\Theta\left(\frac{k}{\alpha2\min {\varepsilon2,1}}\right)$ samples to guarantee that $d_{TV}(h,\hat{f})\leq \alpha + 9 \min_{f\in \mathcal{F}}d_{TV}(h,f)$ with high probability. This sample complexity is optimal for $\varepsilon<1$, matching the lower bound of Gopi et al. (2020). All previously known algorithms for this problem required $\Omega\left(\frac{k\log k}{\alpha2\min { \varepsilon2 ,1}} \right)$ samples to work. Moreover, our result demonstrates the power of interaction for $\varepsilon$-LDP hypothesis selection. Namely, it breaks the known lower bound of $\Omega\left(\frac{k\log k}{\alpha2\min { \varepsilon2 ,1}} \right)$ for the sample complexity of non-interactive hypothesis selection. Our algorithm breaks this barrier using only $\Theta(\log \log k)$ rounds of interaction. To prove our results, we define the notion of \emph{critical queries} for a Statistical Query Algorithm (SQA) which may be of independent interest. Informally, an SQA is said to use a small number of critical queries if its success relies on the accuracy of only a small number of queries it asks. We then design an LDP algorithm that uses a smaller number of critical queries.
- Ishaq Aden-Ali, Hassan Ashtiani and Gautam Kamath “On the sample complexity of privately learning unbounded high-dimensional gaussians” In Algorithmic Learning Theory, 2021, pp. 185–216 PMLR
- Ishaq Aden-Ali, Hassan Ashtiani and Christopher Liaw “Privately learning mixtures of axis-aligned gaussians” In Advances in Neural Information Processing Systems 34, 2021, pp. 3925–3938
- Mohammad Afzali, Hassan Ashtiani and Christopher Liaw “Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples” In arXiv preprint arXiv:2309.03847, 2023
- Jamil Arbas, Hassan Ashtiani and Christopher Liaw “Polynomial time and private learning of unbounded Gaussian Mixture Models” In International Conference on Machine Learning, 2023 PMLR
- “Near-optimal sample complexity bounds for robust learning of gaussian mixtures via compression schemes” In Journal of the ACM (JACM) 67.6 ACM New York, NY, USA, 2020, pp. 1–42
- “Inference under information constraints III: Local privacy constraints” In IEEE Journal on Selected Areas in Information Theory 2.1 IEEE, 2021, pp. 253–267
- “Test without trust: Optimal locally private distribution testing” In The 22nd International Conference on Artificial Intelligence and Statistics, 2019, pp. 2067–2076 PMLR
- “Unified lower bounds for interactive high-dimensional estimation under information constraints” In arXiv preprint arXiv:2010.06562, 2020
- “The role of interactivity in structured estimation” In Conference on Learning Theory, 2022, pp. 1328–1355 PMLR
- “Fast and near-optimal algorithms for approximating distributions by histograms” In Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 2015, pp. 249–263
- “Sample-optimal density estimation in nearly-linear time” In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, 2017, pp. 1278–1289 SIAM
- “Maximum selection and sorting with adversarial comparators” In The Journal of Machine Learning Research 19.1 JMLR. org, 2018, pp. 2427–2457
- “Fast Optimal Locally Private Mean Estimation via Random Projections” In arXiv preprint arXiv:2306.04444, 2023
- Hilal Asi, Vitaly Feldman and Kunal Talwar “Optimal algorithms for mean estimation under local differential privacy” In International Conference on Machine Learning, 2022, pp. 1046–1056 PMLR
- “Sorting with adversarial comparators and application to density estimation” In 2014 IEEE International Symposium on Information Theory, 2014, pp. 1682–1686 IEEE
- “Privately Estimating a Gaussian: Efficient, Robust, and Optimal” In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, 2023, pp. 483–496
- “Private and polynomial time algorithms for learning Gaussians and beyond” In Conference on Learning Theory, 2022, pp. 1075–1076 PMLR
- “Some techniques in density estimation” In arXiv preprint arXiv:1801.04003, 2018
- “Contraction of Locally Differentially Private Mechanisms” In arXiv preprint arXiv:2210.13386, 2022
- “Private Distribution Learning with Public Data: The View from Sample Compression” In arXiv preprint arXiv:2308.06239, 2023
- “Statistically near-optimal hypothesis selection” In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), 2022, pp. 909–919 IEEE
- “Coinpress: Practical private mean and covariance estimation” In Advances in Neural Information Processing Systems 33, 2020, pp. 14475–14485
- “Testing that distributions are close” In Proceedings 41st Annual Symposium on Foundations of Computer Science, 2000, pp. 259–269 IEEE
- Lucien Birge “The grenader estimator: A nonasymptotic approach” In The Annals of Statistics JSTOR, 1989, pp. 1532–1549
- Olivier Bousquet, Daniel Kane and Shay Moran “The optimal approximation factor in density estimation” In Conference on Learning Theory, 2019, pp. 318–341 PMLR
- Alex Bie, Gautam Kamath and Vikrant Singhal “Private estimation with public data” In Advances in Neural Information Processing Systems 35, 2022, pp. 18653–18666
- “Private hypothesis selection” In Advances in Neural Information Processing Systems 32, 2019
- Mark Bun, Jelani Nelson and Uri Stemmer “Heavy hitters and the structure of local privacy” In ACM Transactions on Algorithms (TALG) 15.4 ACM New York, NY, USA, 2019, pp. 1–40
- Clément L. Canonne “A Survey on Distribution Testing: Your Data is Big. But is it Blue?”, Graduate Surveys 9 Theory of Computing Library, 2020, pp. 1–100 DOI: 10.4086/toc.gs.2020.009
- “Efficient density estimation via piecewise polynomial approximation” In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, 2014, pp. 604–613
- “The structure of optimal private tests for simple hypotheses” In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, 2019, pp. 310–321
- “Private identity testing for high-dimensional distributions” In Advances in Neural Information Processing Systems 33, 2020, pp. 10099–10111
- “Differentially-private clustering of easy instances” In International Conference on Machine Learning, 2021, pp. 2049–2059 PMLR
- Wei-Ning Chen, Peter Kairouz and Ayfer Ozgur “Breaking the communication-privacy-accuracy trilemma” In Advances in Neural Information Processing Systems 33, 2020, pp. 3312–3324
- Constantinos Daskalakis, Ilias Diakonikolas and Rocco A Servedio “Learning poisson binomial distributions” In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, 2012, pp. 709–728
- “Locally private learning without interaction requires separation” In Advances in neural information processing systems 32, 2019
- Ilias Diakonikolas “Learning Structured Distributions.” In Handbook of Big Data 267, 2016, pp. 10–1201
- Apple Differential Privacy Team “Learning with Privacy at Scale”, https://machinelearning.apple.com/research/learning-with-privacy-at-scale, December 2017
- John C Duchi, Michael I Jordan and Martin J Wainwright “Local privacy and statistical minimax rates” In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, 2013, pp. 429–438 IEEE
- John C Duchi, Michael I Jordan and Martin J Wainwright “Minimax optimal procedures for locally private estimation” In Journal of the American Statistical Association 113.521 Taylor & Francis, 2018, pp. 182–201
- “Faster and sample near-optimal algorithms for proper learning mixtures of gaussians” In Conference on Learning Theory, 2014, pp. 1183–1213 PMLR
- Ilias Diakonikolas and Daniel M Kane “A new approach for testing properties of discrete distributions” In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), 2016, pp. 685–694 IEEE
- “A bounded-noise mechanism for differential privacy” In Conference on Learning Theory, 2022, pp. 625–661 PMLR
- Ilias Diakonikolas and Daniel M Kane “Algorithmic high-dimensional robust statistics” Cambridge University Press, 2023
- “Robust estimators in high-dimensions without the computational intractability” In SIAM Journal on Computing 48.2 SIAM, 2019, pp. 742–864
- “Our data, ourselves: Privacy via distributed noise generation” In Advances in Cryptology-EUROCRYPT 2006: 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques, St. Petersburg, Russia, May 28-June 1, 2006. Proceedings 25, 2006, pp. 486–503 Springer
- Ilias Diakonikolas, Daniel M Kane and Alistair Stewart “Statistical query lower bounds for robust estimation of high-dimensional gaussians and gaussian mixtures” In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), 2017, pp. 73–84 IEEE
- Bolin Ding, Janardhan Kulkarni and Sergey Yekhanin “Collecting telemetry data privately” In Advances in Neural Information Processing Systems 30, 2017
- “Combinatorial methods in density estimation” Springer Science & Business Media, 2001
- “A universally acceptable smoothing factor for kernel density estimates” In The Annals of Statistics JSTOR, 1996, pp. 2499–2512
- “Nonasymptotic universal smoothing factors, kernel complexity and Yatracos classes” In The Annals of Statistics JSTOR, 1997, pp. 2626–2637
- “Calibrating noise to sensitivity in private data analysis” In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3, 2006, pp. 265–284 Springer
- “The Algorithmic Foundations of Differential Privacy”, 2014 URL: http://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf
- “Lower bounds for locally private estimation via communication complexity” In Conference on Learning Theory, 2019, pp. 1161–1191 PMLR
- “Learning new words”, VUS Patent 9,645,998, May 9 2017
- Alexandre Evfimievski, Johannes Gehrke and Ramakrishnan Srikant “Limiting privacy breaches in privacy preserving data mining” In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, 2003, pp. 211–222
- Alexander Edmonds, Aleksandar Nikolov and Jonathan Ullman “The power of factorization mechanisms in local and central differential privacy” In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, 2020, pp. 425–438
- Úlfar Erlingsson, Vasyl Pihur and Aleksandra Korolova “Rappor: Randomized aggregatable privacy-preserving ordinal response” In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, 2014, pp. 1054–1067
- Vitaly Feldman “A general characterization of the statistical query complexity” In Conference on Learning Theory, 2017, pp. 785–830 PMLR
- “Locally private hypothesis selection” In Conference on Learning Theory, 2020, pp. 1785–1816 PMLR
- Badih Ghazi, Ravi Kumar and Pasin Manurangsi “On avoiding the union bound when answering multiple differentially private queries” In Conference on Learning Theory, 2021, pp. 2133–2146 PMLR
- “On testing expansion in bounded-degree graphs” In Studies in Complexity and Cryptography. Miscellanea on the Interplay between Randomness and Computation: In Collaboration with Lidor Avigad, Mihir Bellare, Zvika Brakerski, Shafi Goldwasser, Shai Halevi, Tali Kaufman, Leonid Levin, Noam Nisan, Dana Ron, Madhu Sudan, Luca Trevisan, Salil Vadhan, Avi Wigderson, David Zuckerman Springer, 2011, pp. 68–75
- “On density estimation in the view of Kolmogorov’s ideas in approximation theory” In The Annals of Statistics JSTOR, 1990, pp. 999–1010
- “Robustness implies privacy in statistical estimation” In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, 2023, pp. 497–506
- Yanjun Han, Ayfer Özgür and Tsachy Weissman “Geometric lower bounds for distributed parameter estimation under communication constraints” In Conference On Learning Theory, 2018, pp. 3163–3188 PMLR
- “Locally private gaussian estimation” In Advances in Neural Information Processing Systems 32, 2019
- “The role of interactivity in local differential privacy” In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), 2019, pp. 94–105 IEEE
- Matthew Joseph, Jieming Mao and Aaron Roth “Exponential separations in local differential privacy” In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, 2020, pp. 515–527 SIAM
- Matthew Joseph, Jieming Mao and Aaron Roth “Exponential Separations in Local Privacy” In ACM Trans. Algorithms 18.4 New York, NY, USA: Association for Computing Machinery, 2022
- Michael Kearns “Efficient noise-tolerant learning from statistical queries” In Journal of the ACM (JACM) 45.6 ACM New York, NY, USA, 1998, pp. 983–1006
- “What can we learn privately?” In SIAM Journal on Computing 40.3 SIAM, 2011, pp. 793–826
- “Privately learning high-dimensional distributions” In Conference on Learning Theory, 2019, pp. 1853–1902 PMLR
- “Advances and open problems in federated learning” In Foundations and Trends® in Machine Learning 14.1–2 Now Publishers, Inc., 2021, pp. 1–210
- “A private and computationally-efficient estimator for unbounded gaussians” In Conference on Learning Theory, 2022, pp. 544–572 PMLR
- Pravesh Kothari, Pasin Manurangsi and Ameya Velingker “Private robust estimation by stabilizing convex relaxations” In Conference on Learning Theory, 2022, pp. 723–777 PMLR
- “Differentially private algorithms for learning mixtures of separated gaussians” In Advances in Neural Information Processing Systems 32, 2019
- Gautam Kamath, Vikrant Singhal and Jonathan Ullman “Private mean estimation of heavy-tailed distributions” In Conference on Learning Theory, 2020, pp. 2204–2235 PMLR
- “Finite Sample Differentially Private Confidence Intervals” In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018), 2018 Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik
- “Density estimation in linear time” In arXiv preprint arXiv:0712.2869, 2007
- “Settling the polynomial learnability of mixtures of gaussians” In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, 2010, pp. 93–102 IEEE
- Shyam Narayanan “Private high-dimensional hypothesis testing” In Conference on Learning Theory, 2022, pp. 3979–4027 PMLR
- Jerzy Neyman and Egon Sharpe Pearson “IX. On the problem of the most efficient tests of statistical hypotheses” In Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character 231.694-706 The Royal Society London, 1933, pp. 289–337
- “Simple Binary Hypothesis Testing under Local Differential Privacy and Communication Constraints” In arXiv preprint arXiv:2301.03566, 2023
- Liam Paninski “A coincidence-based test for uniformity given very sparsely sampled discrete data” In IEEE Transactions on Information Theory 54.10 IEEE, 2008, pp. 4750–4755
- Ankit Pensia, Po-Ling Loh and Varun Jog “Simple Binary Hypothesis Testing under Communication Constraints” In 2022 IEEE International Symposium on Information Theory (ISIT), 2022, pp. 3297–3302 IEEE
- Henry Scheffé “A useful convergence theorem for probability distributions” In The Annals of Mathematical Statistics 18.3 JSTOR, 1947, pp. 434–438
- Vikrant Singhal “A Polynomial Time, Pure Differentially Private Estimator for Binary Product Distributions” In arXiv preprint arXiv:2304.06787, 2023
- “Near-optimal-sample estimators for spherical gaussian mixtures” In Advances in Neural Information Processing Systems 27, 2014
- “Between Pure and Approximate Differential Privacy” In Journal of Privacy and Confidentiality 7.2, 2016
- Balázs Szörényi “Characterizing statistical query learning: simplified notions and proofs” In International Conference on Algorithmic Learning Theory, 2009, pp. 186–200 Springer
- “Friendlycore: Practical differentially private aggregation” In International Conference on Machine Learning, 2022, pp. 21828–21863 PMLR
- Jonathan Ullman “Answering n {{\{{2+ o (1)}}\}} counting queries with differential privacy is hard” In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, 2013, pp. 361–370
- Jonathan Ullman “Tight lower bounds for locally differentially private selection” In arXiv preprint arXiv:1802.02638, 2018
- Stanley L Warner “Randomized response: A survey technique for eliminating evasive answer bias” In Journal of the American Statistical Association 60.309 Taylor & Francis, 1965, pp. 63–69
- Yannis G Yatracos “Rates of convergence of minimum distance estimators and Kolmogorov’s entropy” In The Annals of Statistics 13.2 Institute of Mathematical Statistics, 1985, pp. 768–774
- “Information-theoretic determination of minimax rates of convergence” In Annals of Statistics JSTOR, 1999, pp. 1564–1599