Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Causal Discovery Under Local Privacy (2311.04037v3)

Published 7 Nov 2023 in cs.CR, cs.AI, cs.LG, and stat.ME

Abstract: Differential privacy is a widely adopted framework designed to safeguard the sensitive information of data providers within a data set. It is based on the application of controlled noise at the interface between the server that stores and processes the data, and the data consumers. Local differential privacy is a variant that allows data providers to apply the privatization mechanism themselves on their data individually. Therefore it provides protection also in contexts in which the server, or even the data collector, cannot be trusted. The introduction of noise, however, inevitably affects the utility of the data, particularly by distorting the correlations between individual data components. This distortion can prove detrimental to tasks such as causal discovery. In this paper, we consider various well-known locally differentially private mechanisms and compare the trade-off between the privacy they provide, and the accuracy of the causal structure produced by algorithms for causal learning when applied to data obfuscated by these mechanisms. Our analysis yields valuable insights for selecting appropriate local differentially private protocols for causal discovery tasks. We foresee that our findings will aid researchers and practitioners in conducting locally private causal discovery.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Hadamard response: Estimating distributions privately, efficiently, and with little communication. In Kamalika Chaudhuri and Masashi Sugiyama, editors, Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 of Proceedings of Machine Learning Research, pages 1120–1129. PMLR, 16–18 Apr 2019.
  2. Causal inference with corrupted data: Measurement error, missing values, discretization, and differential privacy. arXiv preprint arXiv:2107.02780, 2021.
  3. Geo-indistinguishability: Differential privacy for location-based systems. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security - CCS '13. ACM Press, 2013. 10.1145/2508859.2516735.
  4. Improving the utility of locally differentially private protocols for longitudinal and multidimensional frequency estimates. Digital Communications and Networks, 2022. 10.1016/j.dcan.2022.07.003.
  5. On the risks of collecting multidimensional data under local differential privacy. Proc. VLDB Endow., 16(5):1126–1139, jan 2023. ISSN 2150-8097. 10.14778/3579075.3579086.
  6. Causal discovery for fairness. arXiv preprint arXiv:2206.06685, 2022.
  7. Cause-effect inference by comparing regression errors. In International Conference on Artificial Intelligence and Statistics, pages 900–909. PMLR, 2018.
  8. Bayes security: A not so average metric. In 2023 2023 IEEE 36th Computer Security Foundations Symposium (CSF) (CSF), pages 159–177, Los Alamitos, CA, USA, jul 2023. IEEE Computer Society. 10.1109/CSF57540.2023.00011.
  9. Broadening the scope of differential privacy using metrics. In Privacy Enhancing Technologies: 13th International Symposium, PETS 2013, Bloomington, IN, USA, July 10-12, 2013. Proceedings 13, pages 82–102. Springer, 2013.
  10. David Maxwell Chickering. Optimal structure identification with greedy search. Journal of machine learning research, 3(Nov):507–554, 2002.
  11. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 26(4):404–413, 1934.
  12. Privacy at scale: Local differential privacy in practice. In Proceedings of the 2018 International Conference on Management of Data, pages 1655–1658, 2018.
  13. Inferring deterministic causal relations. arXiv preprint arXiv:1203.3475, 2012.
  14. Apple Differential Privacy Team. Learning with privacy at scale. In Apple Machine Learning Journal, volume 1. Apple, 2017.
  15. Collecting telemetry data privately. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pages 3574–3583, Red Hook, NY, USA, 2017. Curran Associates Inc. ISBN 9781510860964.
  16. Multi-dimensional randomized response. IEEE Transactions on Knowledge and Data Engineering, 34(10):4933–4946, 2022. 10.1109/TKDE.2020.3045759.
  17. Local privacy and statistical minimax rates. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 429–438. IEEE, October 2013. 10.1109/focs.2013.53.
  18. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, pages 265–284. Springer Berlin Heidelberg, 2006. 10.1007/11681878_14.
  19. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
  20. On causal discovery from time series data using fci. Probabilistic graphical models, pages 121–128, 2010.
  21. RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pages 1054–1067, New York, NY, USA, 2014. ACM. 10.1145/2660267.2660348.
  22. José AR Fonollosa. Conditional distribution variability measures for causality detection. Cause Effect Pairs in Machine Learning, pages 339–347, 2019.
  23. Local private hypothesis testing: Chi-square tests. In International Conference on Machine Learning, pages 1626–1635. PMLR, 2018.
  24. Review of causal discovery methods based on graphical models. Frontiers in genetics, 10:524, 2019.
  25. Locally private hypothesis selection. In Conference on Learning Theory, pages 1785–1816. PMLR, 2020.
  26. Galton’s family heights data revisited. arXiv preprint arXiv:1508.02942, 2015.
  27. Nonlinear causal discovery with additive noise models. Advances in neural information processing systems, 21, 2008.
  28. Galton’s data a century later. American Psychologist, 40(8):875, 1985.
  29. Pate-gan: Generating synthetic data with differential privacy guarantees. In International conference on learning representations, 2019.
  30. Discrete distribution estimation under local privacy. In Int. Conf. on Machine Learning, pages 2436–2444. PMLR, 2016.
  31. Causal discovery toolbox: Uncovering causal relationships in python. The Journal of Machine Learning Research, 21(1):1406–1410, 2020.
  32. The effect of noise level on the accuracy of causal discovery methods with additive noise models. In Benelux Conference on Artificial Intelligence, pages 120–140. Springer, 2021.
  33. What can we learn privately? In 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pages 531–540. IEEE, October 2008. 10.1109/FOCS.2008.27.
  34. Hiroaki Kikuchi. Castell: Scalable joint probability estimation of multi-dimensional data randomized with local differential privacy. arXiv preprint arXiv:2212.01627, 2022.
  35. Efficient sampling and structure learning of bayesian networks. Journal of Computational and Graphical Statistics, 31(3):639–650, 2022.
  36. Private causal inference. In Arthur Gretton and Christian C. Robert, editors, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, volume 51 of Proceedings of Machine Learning Research, pages 1308–1317, Cadiz, Spain, 09–11 May 2016. PMLR.
  37. Trent Kyono and Mihaela Van der Schaar. Exploiting causal structure for robust model selection in unsupervised domain adaptation. IEEE Transactions on Artificial Intelligence, 2(6):494–507, 2021.
  38. Causal reasoning for algorithmic fairness. arXiv preprint arXiv:1805.05859, 2018.
  39. Noleaks: Differentially private causal discovery under functional causal model. IEEE Transactions on Information Forensics and Security, 17:2324–2338, 2022. 10.1109/TIFS.2022.3184263.
  40. Privacy: Theory meets practice on the map. In Proceedings of the IEEE 24th International Conference on Data Engineering (ICDE), pages 277–286, 04 2008. 10.1109/ICDE.2008.4497436.
  41. Explainable reinforcement learning through a causal lens. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 2493–2500, 2020.
  42. Christopher Meek. Graphical Models: Selecting causal and statistical models. PhD thesis, Carnegie Mellon University, 1997.
  43. Distinguishing cause from effect using observational data: methods and benchmarks. The Journal of Machine Learning Research, 17(1):1103–1204, 2016.
  44. De-anonymizing social networks. Proceedings - IEEE Symposium on Security and Privacy, 04 2009. 10.1109/SP.2009.22.
  45. Causal discovery in machine learning: Theories and applications. Journal of Dynamics & Games, 8(3):203, 2021.
  46. Locally private causal inference. arXiv preprint arXiv:2301.01616, 2023.
  47. An approach to optimal discretization of continuous real random variables with application to machine learning, 2020.
  48. A million variables and more: the fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images. International journal of data science and analytics, 3:121–129, 2017.
  49. Distance makes the types grow stronger a calculus for differential privacy. Sigplan Notices - SIGPLAN, 45:157–168, 09 2010. 10.1145/1932681.1863568.
  50. Improving the accuracy of medical diagnosis with causal machine learning. Nature communications, 11(1):3923, 2020.
  51. Benchpress: a scalable and versatile workflow for benchmarking structure learning algorithms for graphical models, 2021.
  52. Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308(5721):523–529, 2005.
  53. Toward causal representation learning. Proceedings of the IEEE, 109(5):612–634, 2021.
  54. An algorithm for fast recovery of sparse causal graphs. Social science computer review, 9(1):62–72, 1991.
  55. Alleviating privacy attacks via causal learning. In International Conference on Machine Learning, pages 9537–9547. PMLR, 2020.
  56. The max-min hill-climbing bayesian network structure learning algorithm. Machine learning, 65(1):31–78, 2006.
  57. Towards practical differentially private causal graph discovery. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 5516–5526. Curran Associates, Inc., 2020.
  58. Locally differentially private protocols for frequency estimation. In 26th USENIX Security Symposium (USENIX Security 17), pages 729–745, Vancouver, BC, August 2017. USENIX Association. ISBN 978-1-931971-40-9.
  59. Stanley L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69, March 1965. 10.1080/01621459.1965.10480775.
  60. Differential privacy preserving causal graph discovery. In 2017 IEEE Symposium on Privacy-Aware Computing (PAC), pages 60–71, 2017. 10.1109/PAC.2017.24.
  61. Privbayes: Private data release via bayesian networks. ACM Trans. Database Syst., 42(4), oct 2017. ISSN 0362-5915. 10.1145/3134428.

Summary

We haven't generated a summary for this paper yet.