Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Conditional Density Estimations from Privacy-Protected Data (2310.12781v3)

Published 19 Oct 2023 in stat.ML, cs.LG, and stat.CO

Abstract: Many modern statistical analysis and machine learning applications require training models on sensitive user data. Differential privacy provides a formal guarantee that individual-level information about users does not leak. In this framework, randomized algorithms inject calibrated noise into the confidential data, resulting in privacy-protected datasets or queries. However, restricting access to only privatized data during statistical analysis makes it computationally challenging to make valid inferences on the parameters underlying the confidential data. In this work, we propose simulation-based inference methods from privacy-protected datasets. In addition to sequential Monte Carlo approximate Bayesian computation, we use neural conditional density estimators as a flexible family of distributions to approximate the posterior distribution of model parameters given the observed private query results. We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models. Illustrating the privacy-utility trade-off, our experiments and analysis demonstrate the necessity and feasibility of designing valid statistical inference procedures to correct for biases introduced by the privacy-protection mechanisms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. The 2020 Census Disclosure Avoidance System TopDown Algorithm. Harvard Data Science Review, (Special Issue 2). https://hdsr.mitpress.mit.edu/pub/7evz361i.
  2. On functions of bounded variation. In Mathematical Proceedings of the Cambridge Philosophical Society, volume 162, pages 405–418. Cambridge University Press.
  3. Hypothesis testing for differentially private linear regression. Advances in Neural Information Processing Systems, 35:14196–14209.
  4. Differentially private distributed Bayesian linear regression with MCMC. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J., editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 627–641. PMLR.
  5. Faster rates of convergence to stationary points in differentially private optimization. In International Conference on Machine Learning, pages 1060–1092. PMLR.
  6. Simulation-based, finite-sample inference for privatized data.
  7. Differentially private significance tests for regression coefficients. Journal of Computational and Graphical Statistics, 28(2):440–453.
  8. Differentially private stochastic optimization: New results in convex and non-convex settings. In Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W., editors, Advances in Neural Information Processing Systems.
  9. Transformations and hardy–krause variation. SIAM Journal on Numerical Analysis, 54(3):1946–1966.
  10. Adaptive approximate bayesian computation. Biometrika, 96(4):983–990.
  11. Differentially private bayesian inference for exponential families. Advances in Neural Information Processing Systems, 31.
  12. Differentially private bayesian linear regression. Advances in Neural Information Processing Systems, 32.
  13. Private gans, revisited. arXiv preprint arXiv:2302.02936.
  14. Coinpress: Practical private mean and covariance estimation. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H., editors, Advances in Neural Information Processing Systems, volume 33, pages 14475–14485. Curran Associates, Inc.
  15. The use of a single pseudo-sample in approximate bayesian computation. Statistics and Computing, 27:583–590.
  16. Mining gold from implicit models to improve likelihood-free inference. Proceedings of the National Academy of Sciences, 117(10):5242–5249.
  17. The cost of privacy: Optimal rates of convergence for parameter estimation with differential privacy. The Annals of Statistics, 49(5):2825–2850.
  18. Measurement error in nonlinear models, volume 105. CRC press.
  19. A near-optimal algorithm for differentially-private principal components. Journal of Machine Learning Research, 14.
  20. Unbiased statistical estimation and valid confidence intervals under differential privacy. arXiv preprint arXiv:2110.14465.
  21. The frontier of simulation-based inference. Proceedings of the National Academy of Sciences, 117(48):30055–30062.
  22. Density estimation using real NVP. In International Conference on Learning Representations.
  23. Gaussian differential privacy. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(1):3–37.
  24. Drechsler, J. (2023). Differential privacy for government agencies—are we there yet? Journal of the American Statistical Association, 118(541):761–773.
  25. Neural spline flows. Advances in Neural Information Processing Systems, 32.
  26. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3, pages 265–284. Springer.
  27. Eisenberg, J. (2020). R0: How scientists quantify the intensity of an outbreak like coronavirus and its pandemic potential. https://sph.umich.edu/pursuit/2020posts/how-scientists-quantify-outbreaks.html. Accessed: 2023-10-08.
  28. Statistically valid inferences from differentially private data releases, with application to the facebook urls dataset. Political Analysis, 31(1):1–21.
  29. Statistically valid inferences from privacy-protected data. American Political Science Review, pages 1–16.
  30. On the theory and practice of privacy-preserving bayesian data analysis. arXiv preprint arXiv:1603.07294.
  31. Gillespie, D. T. (1977). Exact stochastic simulation of coupled chemical reactions. The Journal of Physical Chemistry, 81(25):2340–2361.
  32. Gong, R. (2022). Exact inference with approximate computation for differentially private data via perturbations. Journal of Privacy and Confidentiality, 12(2).
  33. Harnessing the known unknowns: Differential privacy and the 2020 census (co-editors’ forward). Harvard Data Science Review, (Special Issue 2).
  34. Indirect inference. Journal of Applied Econometrics, 8(S1):S85–S118.
  35. Automatic posterior transformation for likelihood-free inference. In International Conference on Machine Learning, pages 2404–2414. PMLR.
  36. A kernel two-sample test. The Journal of Machine Learning Research, 13(1):723–773.
  37. Measurement error models: from nonparametric methods to deep neural networks. Statistical Science, 37(4):473–493.
  38. Data augmentation mcmc for bayesian inference from privatized data. Advances in Neural Information Processing Systems, 35:12732–12743.
  39. Private posterior distributions from variational approximations. arXiv preprint arXiv:1511.07896.
  40. Misspecification-robust sequential neural likelihood. arXiv preprint arXiv:2301.13368.
  41. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  42. Normalizing flows: An introduction and review of current methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11):3964–3979.
  43. Estimation and inference for high-dimensional generalized linear models with knowledge transfer. Journal of the American Statistical Association, pages 1–12.
  44. Liu, F. (2018). Generalized gaussian mechanism for differential privacy. IEEE Transactions on Knowledge and Data Engineering, 31(4):747–756.
  45. Revisiting classifier two-sample tests. arXiv preprint arXiv:1610.06545.
  46. Benchmarking simulation-based inference. In International Conference on Artificial Intelligence and Statistics, pages 343–351. PMLR.
  47. Flexible statistical inference for mechanistic models of neural dynamics. Advances in Neural Information Processing Systems, 30.
  48. L’Ecuyer, P. (2018). Randomized quasi-Monte Carlo: An introduction for practitioners. Springer.
  49. Contrastive Neural Ratio Estimation. arXiv preprint arXiv:2210.06170.
  50. Smooth sensitivity and sampling in private data analysis. In Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, STOC ’07, page 75–84, New York, NY, USA. Association for Computing Machinery.
  51. Owen, A. B. (1997). Monte carlo variance of scrambled net quadrature. SIAM Journal on Numerical Analysis, 34(5):1884–1910.
  52. Fast ε𝜀\varepsilonitalic_ε-free inference of simulation models with bayesian conditional density estimation. Advances in Neural Information Processing Systems, 29.
  53. Normalizing flows for probabilistic modeling and inference. J. Mach. Learn. Res., 22(1).
  54. Masked autoregressive flow for density estimation. Advances in Neural Information Processing Systems, 30.
  55. Sequential neural likelihood: Fast likelihood-free inference with autoregressive flows. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 837–848. PMLR.
  56. Differentially private synthetic control. In International Conference on Artificial Intelligence and Statistics, pages 1457–1491. PMLR.
  57. Differentially private normalizing flows for density estimation, data synthesis, and variational inference with application to electronic health records. arXiv preprint arXiv:2302.05787.
  58. Differentially private normalizing flows for privacy-preserving density estimation. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’21, page 1000–1009, New York, NY, USA. Association for Computing Machinery.
  59. Robust neural posterior estimation and statistical model criticism. Advances in Neural Information Processing Systems, 35:33845–33859.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yifei Xiong (6 papers)
  2. Nianqiao P. Ju (1 paper)
  3. Sanguo Zhang (9 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.