Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Private Statistical Estimation of Many Quantiles (2302.06943v3)

Published 14 Feb 2023 in stat.ML and cs.LG

Abstract: This work studies the estimation of many statistical quantiles under differential privacy. More precisely, given a distribution and access to i.i.d. samples from it, we study the estimation of the inverse of its cumulative distribution function (the quantile function) at specific points. For instance, this task is of key importance in private data generation. We present two different approaches. The first one consists in privately estimating the empirical quantiles of the samples and using this result as an estimator of the quantiles of the distribution. In particular, we study the statistical properties of the recently published algorithm introduced by Kaplan et al. 2022 that privately estimates the quantiles recursively. The second approach is to use techniques of density estimation in order to uniformly estimate the quantile function on an interval. In particular, we show that there is a tradeoff between the two methods. When we want to estimate many quantiles, it is better to estimate the density rather than estimating the quantile function at specific points.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Deep learning with differential privacy. In Weippl, E. R., Katzenbeisser, S., Kruegel, C., Myers, A. C., and Halevi, S. (eds.), Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016, pp.  308–318. ACM, 2016. doi: 10.1145/2976749.2978318. URL https://doi.org/10.1145/2976749.2978318.
  2. Abowd, J. M. The us census bureau adopts differential privacy. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp.  2867–2867, 2018.
  3. Differentially private testing of identity and closeness of discrete distributions. In Bengio, S., Wallach, H. M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pp.  6879–6891, 2018. URL https://proceedings.neurips.cc/paper/2018/hash/7de32147a4f1055bed9e4faf3485a84d-Abstract.html.
  4. Optimal rates for nonparametric density estimation under communication constraints. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  26754–26766. Curran Associates, Inc., 2021a. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/e1021d43911ca2c1845910d84f40aeae-Paper.pdf.
  5. Inference under information constraints iii: Local privacy constraints. IEEE Journal on Selected Areas in Information Theory, 2(1):253–267, 2021b. doi: 10.1109/JSAIT.2021.3053569. URL https://doi.org/10.1109/JSAIT.2021.3053569.
  6. Information-constrained optimization: can adaptive processing of gradients help? CoRR, abs/2104.00979, 2021c. URL https://arxiv.org/abs/2104.00979.
  7. Unified lower bounds for interactive high-dimensional estimation under information constraints. CoRR, abs/2010.06562, 2021d. URL https://arxiv.org/abs/2010.06562.
  8. Differentially private assouad, fano, and le cam. In Feldman, V., Ligett, K., and Sabato, S. (eds.), Algorithmic Learning Theory, 16-19 March 2021, Virtual Conference, Worldwide, volume 132 of Proceedings of Machine Learning Research, pp.  48–78. PMLR, 2021e. URL http://proceedings.mlr.press/v132/acharya21a.html.
  9. Bounded space differentially private quantiles. CoRR, abs/2201.03380, 2022. URL https://arxiv.org/abs/2201.03380.
  10. Allen, J. e. a. Smartnoise core differential privacy library. https://github.com/opendp/smartnoise-core.
  11. Near instance-optimality in differential privacy. CoRR, abs/2005.10630, 2020. URL https://arxiv.org/abs/2005.10630.
  12. Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In Williamson, C. L., Zurko, M. E., Patel-Schneider, P. F., and Shenoy, P. J. (eds.), Proceedings of the 16th International Conference on World Wide Web, WWW 2007, Banff, Alberta, Canada, May 8-12, 2007, pp. 181–190. ACM, 2007. doi: 10.1145/1242572.1242598. URL https://doi.org/10.1145/1242572.1242598.
  13. Fisher information for distributed estimation under a blackboard communication protocol. In 2019 IEEE International Symposium on Information Theory (ISIT), pp.  2704–2708, 2019. doi: 10.1109/ISIT.2019.8849821.
  14. Fisher information under local differential privacy. IEEE Journal on Selected Areas in Information Theory, 1(3):645–659, 2020a. doi: 10.1109/JSAIT.2020.3039461. URL https://doi.org/10.1109/JSAIT.2020.3039461.
  15. Lower bounds for learning distributions under communication constraints via fisher information. Journal of Machine Learning Research, 21:Paper No. 236, 30, 2020b. ISSN 1532-4435. URL https://jmlr.csail.mit.edu/papers/volume21/19-737/19-737.pdf.
  16. Classification under local differential privacy, 2019. URL https://arxiv.org/abs/1912.04629.
  17. The johnson-lindenstrauss transform itself preserves differential privacy. In 53rd Annual IEEE Symposium on Foundations of Computer Science, FOCS 2012, New Brunswick, NJ, USA, October 20-23, 2012, pp. 410–419. IEEE Computer Society, 2012. doi: 10.1109/FOCS.2012.67. URL https://doi.org/10.1109/FOCS.2012.67.
  18. Differentially private release and learning of threshold functions. In Guruswami, V. (ed.), IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pp.  634–649. IEEE Computer Society, 2015. doi: 10.1109/FOCS.2015.45. URL https://doi.org/10.1109/FOCS.2015.45.
  19. Local differential privacy: Elbow effect in optimal density estimation and adaptation over besov ellipsoids. CoRR, abs/1903.01927, 2019. URL http://arxiv.org/abs/1903.01927.
  20. Improved differential privacy for SGD via optimal private linear operators on adaptive streams. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/271ec4d1a9ff5e6b81a6e21d38b1ba96-Abstract-Conference.html.
  21. Devroye, L. Non-Uniform Random Variate Generation. Springer, 1986. ISBN 978-1-4613-8645-2. doi: 10.1007/978-1-4613-8643-8. URL https://doi.org/10.1007/978-1-4613-8643-8.
  22. Collecting telemetry data privately. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 3571–3580, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/253614bbac999b38b5b60cae531c4969-Abstract.html.
  23. Revealing information while preserving privacy. In Neven, F., Beeri, C., and Milo, T. (eds.), Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 9-12, 2003, San Diego, CA, USA, pp.  202–210. ACM, 2003. doi: 10.1145/773153.773173. URL https://doi.org/10.1145/773153.773173.
  24. Gaussian differential privacy. CoRR, abs/1905.02383, 2019. URL http://arxiv.org/abs/1905.02383.
  25. Optimal differential privacy composition for exponential mechanisms. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pp.  2597–2606. PMLR, 2020. URL http://proceedings.mlr.press/v119/dong20a.html.
  26. Nonparametric Differentially Private Confidence Intervals for the Median. Journal of Survey Statistics and Methodology, 10(3):804–829, 06 2022. ISSN 2325-0984. doi: 10.1093/jssam/smac021. URL https://doi.org/10.1093/jssam/smac021.
  27. Local privacy and statistical minimax rates. In 51st Annual Allerton Conference on Communication, Control, and Computing, Allerton 2013, Allerton Park & Retreat Center, Monticello, IL, USA, October 2-4, 2013, pp.  1592. IEEE, 2013. doi: 10.1109/Allerton.2013.6736718. URL https://doi.org/10.1109/Allerton.2013.6736718.
  28. Local privacy, data processing inequalities, and statistical minimax rates, 2014. URL https://arxiv.org/abs/1302.3203.
  29. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211–407, 2014. doi: 10.1561/0400000042. URL https://doi.org/10.1561/0400000042.
  30. Our data, ourselves: Privacy via distributed noise generation. In Vaudenay, S. (ed.), Advances in Cryptology - EUROCRYPT 2006, 25th Annual International Conference on the Theory and Applications of Cryptographic Techniques, St. Petersburg, Russia, May 28 - June 1, 2006, Proceedings, volume 4004 of Lecture Notes in Computer Science, pp. 486–503. Springer, 2006a. doi: 10.1007/11761679_29. URL https://doi.org/10.1007/11761679_29.
  31. Calibrating noise to sensitivity in private data analysis. In Halevi, S. and Rabin, T. (eds.), Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, volume 3876 of Lecture Notes in Computer Science, pp.  265–284. Springer, 2006b. doi: 10.1007/11681878_14. URL https://doi.org/10.1007/11681878_14.
  32. RAPPOR: randomized aggregatable privacy-preserving ordinal response. In Ahn, G., Yung, M., and Li, N. (eds.), Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, AZ, USA, November 3-7, 2014, pp.  1054–1067. ACM, 2014. doi: 10.1145/2660267.2660348. URL https://doi.org/10.1145/2660267.2660348.
  33. Model inversion attacks that exploit confidence information and basic countermeasures. In Ray, I., Li, N., and Kruegel, C. (eds.), Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, October 12-16, 2015, pp.  1322–1333. ACM, 2015. doi: 10.1145/2810103.2813677. URL https://doi.org/10.1145/2810103.2813677.
  34. Differentially private quantiles. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pp.  3713–3722. PMLR, 2021. URL http://proceedings.mlr.press/v139/gillenwater21a.html.
  35. A Distribution-Free Theory of Nonparametric Regression. Springer series in statistics. Springer, 2002. ISBN 978-0-387-95441-7. doi: 10.1007/b97848. URL https://doi.org/10.1007/b97848.
  36. Constant matters: Fine-grained complexity of differentially private continual observation using completely bounded norms. CoRR, abs/2202.11205, 2022. URL https://arxiv.org/abs/2202.11205.
  37. Resolving individuals contributing trace amounts of dna to highly complex mixtures using high-density snp genotyping microarrays. PLoS Genet, 4(8):e1000167, 2008.
  38. IBM. Smartnoise core differential privacy library. https://github.com/IBM/differential-privacy-library.
  39. The composition theorem for differential privacy. In Bach, F. R. and Blei, D. M. (eds.), Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pp.  1376–1385. JMLR.org, 2015. URL http://proceedings.mlr.press/v37/kairouz15.html.
  40. Improved rates for differentially private stochastic convex optimization with heavy-tailed data. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., and Sabato, S. (eds.), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pp.  10633–10660. PMLR, 2022. URL https://proceedings.mlr.press/v162/kamath22a.html.
  41. Privately learning thresholds: Closing the exponential gap. In Abernethy, J. D. and Agarwal, S. (eds.), Conference on Learning Theory, COLT 2020, 9-12 July 2020, Virtual Event [Graz, Austria], volume 125 of Proceedings of Machine Learning Research, pp. 2263–2285. PMLR, 2020. URL http://proceedings.mlr.press/v125/kaplan20a.html.
  42. Differentially private approximate quantiles. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., and Sabato, S. (eds.), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pp.  10751–10761. PMLR, 2022. URL https://proceedings.mlr.press/v162/kaplan22a.html.
  43. Kroll, M. On density estimation at a fixed point under local differential privacy. Electronic Journal of Statistics, 15(1):1783 – 1813, 2021. doi: 10.1214/21-EJS1830. URL https://doi.org/10.1214/21-EJS1830.
  44. Private quantiles estimation in the presence of atoms. CoRR, abs/2202.08969, 2022. URL https://arxiv.org/abs/2202.08969.
  45. On the Statistical Complexity of Estimation and Testing under Privacy Constraints. Transactions on Machine Learning Research Journal, April 2023. URL https://hal.science/hal-03794374.
  46. The disclosure of diagnosis codes can breach research participants’ privacy. J. Am. Medical Informatics Assoc., 17(3):322–327, 2010. doi: 10.1136/jamia.2009.002725. URL https://doi.org/10.1136/jamia.2009.002725.
  47. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007), October 20-23, 2007, Providence, RI, USA, Proceedings, pp.  94–103. IEEE Computer Society, 2007. doi: 10.1109/FOCS.2007.41. URL https://doi.org/10.1109/FOCS.2007.41.
  48. How to break anonymity of the netflix prize dataset. CoRR, abs/cs/0610105, 2006. URL http://arxiv.org/abs/cs/0610105.
  49. Robust de-anonymization of large sparse datasets. In 2008 IEEE Symposium on Security and Privacy (S&P 2008), 18-21 May 2008, Oakland, California, USA, pp.  111–125. IEEE Computer Society, 2008. doi: 10.1109/SP.2008.33. URL https://doi.org/10.1109/SP.2008.33.
  50. Smooth sensitivity and sampling in private data analysis. In Johnson, D. S. and Feige, U. (eds.), Proceedings of the 39th Annual ACM Symposium on Theory of Computing, San Diego, California, USA, June 11-13, 2007, pp.  75–84. ACM, 2007. doi: 10.1145/1250790.1250803. URL https://doi.org/10.1145/1250790.1250803.
  51. Smith, A. D. Privacy-preserving statistical estimation with optimal convergence rates. In Fortnow, L. and Vadhan, S. P. (eds.), Proceedings of the 43rd ACM Symposium on Theory of Computing, STOC 2011, San Jose, CA, USA, 6-8 June 2011, pp.  813–822. ACM, 2011. doi: 10.1145/1993636.1993743. URL https://doi.org/10.1145/1993636.1993743.
  52. Steinberger, L. Efficiency in local differential privacy, 2023. URL https://arxiv.org/abs/2301.10600.
  53. Sweeney, L. Simple demographics often identify people uniquely. Health (San Francisco), 671(2000):1–34, 2000.
  54. Sweeney, L. k-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst., 10(5):557–570, 2002. doi: 10.1142/S0218488502001648. URL https://doi.org/10.1142/S0218488502001648.
  55. Learning new words. Granted US Patents, 9594741, 2017.
  56. Tsybakov, A. B. Introduction to Nonparametric Estimation. Springer series in statistics. Springer, 2009. ISBN 978-0-387-79051-0. doi: 10.1007/b13794. URL https://doi.org/10.1007/b13794.
  57. Van der Vaart, A. W. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1998. doi: 10.1017/CBO9780511802256.
  58. Technical privacy metrics: A systematic survey. ACM Comput. Surv., 51(3):57:1–57:38, 2018. doi: 10.1145/3168389. URL https://doi.org/10.1145/3168389.
  59. A statistical framework for differential privacy. Journal of the American Statistical Association, 105(489):375–389, 2010. doi: 10.1198/jasa.2009.tm08651. URL https://doi.org/10.1198/jasa.2009.tm08651.
Citations (4)

Summary

We haven't generated a summary for this paper yet.