LDPRecover: Recovering Frequencies from Poisoning Attacks against Local Differential Privacy (2403.09351v2)
Abstract: Local differential privacy (LDP), which enables an untrusted server to collect aggregated statistics from distributed users while protecting the privacy of those users, has been widely deployed in practice. However, LDP protocols for frequency estimation are vulnerable to poisoning attacks, in which an attacker can poison the aggregated frequencies by manipulating the data sent from malicious users. Therefore, it is an open challenge to recover the accurate aggregated frequencies from poisoned ones. In this work, we propose LDPRecover, a method that can recover accurate aggregated frequencies from poisoning attacks, even if the server does not learn the details of the attacks. In LDPRecover, we establish a genuine frequency estimator that theoretically guides the server to recover the frequencies aggregated from genuine users' data by eliminating the impact of malicious users' data in poisoned frequencies. Since the server has no idea of the attacks, we propose an adaptive attack to unify existing attacks and learn the statistics of the malicious data within this adaptive attack by exploiting the properties of LDP protocols. By taking the estimator and the learning statistics as constraints, we formulate the problem of recovering aggregated frequencies to approach the genuine ones as a constraint inference (CI) problem. Consequently, the server can obtain accurate aggregated frequencies by solving this problem optimally. Moreover, LDPRecover can serve as a frequency recovery paradigm that recovers more accurate aggregated frequencies by integrating attack details as new constraints in the CI problem. Our evaluation on two real-world datasets, three LDP protocols, and untargeted and targeted poisoning attacks shows that LDPRecover is both accurate and widely applicable against various poisoning attacks.
- J. C. Duchi, M. I. Jordan, and M. J. Wainwright, “Local privacy and statistical minimax rates,” in FOCS, 2013.
- C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in TCC, 2006.
- C. Dwork, A. Roth et al., “The algorithmic foundations of differential privacy,” Foundations and Trends in Theoretical Computer Science, 2014.
- Ú. Erlingsson, V. Pihur, and A. Korolova, “Rappor: Randomized aggregatable privacy-preserving ordinal response,” in CCS, 2014.
- G. Fanti, V. Pihur, and Ú. Erlingsson, “Building a rappor with the unknown: Privacy-preserving learning of associations and data dictionaries,” 2016.
- A. Bittau, Ú. Erlingsson, P. Maniatis, I. Mironov, A. Raghunathan, D. Lie, M. Rudominer, U. Kode, J. Tinnes, and B. Seefeld, “Prochlo: Strong privacy for analytics in the crowd,” in SOSP, 2017.
- A. D. P. Team, “Learning with privacy at scale,” Machine Learning Journal, 2017.
- A. Cheu, A. Smith, and J. Ullman, “Manipulation attacks in local differential privacy,” in S&P, 2021.
- X. Cao, J. Jia, and N. Z. Gong, “Data poisoning attacks to local differential privacy protocols,” in USENIX Security, 2021.
- Y. Wu, X. Cao, J. Jia, and N. Z. Gong, “Poisoning attacks to local differential privacy protocols for key-value data,” 2022.
- T. Kieu, B. Yang, and C. S. Jensen, “Outlier detection for multidimensional time series using deep neural networks,” in MDM, 2018.
- Y. Zhou, H. Zou, R. Arghandeh, W. Gu, and C. J. Spanos, “Non-parametric outliers detection in multiple time series a case study: Power grid data analysis,” in AAAI, 2018.
- Y. Su, Y. Zhao, C. Niu, R. Liu, W. Sun, and D. Pei, “Robust anomaly detection for multivariate time series through stochastic recurrent neural network,” in KDD, 2019.
- P. Kairouz, S. Oh, and P. Viswanath, “Extremal mechanisms for local differential privacy,” in NeurIPS, 2014.
- T. Wang, J. Blocki, N. Li, and S. Jha, “Locally differentially private protocols for frequency estimation,” in USENIX Security, 2017.
- S. L. Warner, “Randomized response: A survey technique for eliminating evasive answer bias,” Journal of the American Statistical Association, 1965.
- R. Bassily and A. Smith, “Local, private, efficient protocols for succinct histograms,” in STOC, 2015.
- P. Kairouz, K. Bonawitz, and D. Ramage, “Discrete distribution estimation under local privacy,” in International Conference on Machine Learning. PMLR, 2016, pp. 2436–2444.
- Z. Zhang, T. Wang, N. Li, S. He, and J. Chen, “Calm: Consistent adaptive local marginal for marginal release under local differential privacy,” in CCS, 2018.
- J. Jia and N. Z. Gong, “Calibrate: Frequency estimation and heavy hitter identification with local differential privacy via incorporating prior knowledge,” in INFOCOM, 2019.
- T. Wang, M. Lopuhaä-Zwakenberg, Z. Li, B. Skoric, and N. Li, “Locally differentially private frequency estimation with consistency,” in NDSS, 2020.
- Q. Xue, Q. Ye, H. Hu, Y. Zhu, and J. Wang, “Ddrm: A continual frequency estimation mechanism with local differential privacy,” IEEE Transactions on Knowledge and Data Engineering, 2022.
- B. Ding, J. Kulkarni, and S. Yekhanin, “Collecting telemetry data privately,” in NeurIPS, 2017.
- J. C. Duchi, M. I. Jordan, and M. J. Wainwright, “Minimax optimal procedures for locally private estimation,” Journal of the American Statistical Association, 2018.
- N. Wang, X. Xiao, Y. Yang, J. Zhao, S. C. Hui, H. Shin, J. Shin, and G. Yu, “Collecting and analyzing multidimensional data with local differential privacy,” in ICDE, 2019.
- Z. Li, T. Wang, M. Lopuhaä-Zwakenberg, N. Li, and B. Škoric, “Estimating numerical distributions under local differential privacy,” in SIGMOD, 2020.
- J. Duan, Q. Ye, and H. Hu, “Utility analysis and enhancement of ldp mechanisms in high-dimensional space,” 2022.
- R. Bassily, K. Nissim, U. Stemmer, and A. G. Thakurta, “Practical locally private heavy hitters,” in NeurIPS, 2017.
- T. Wang, N. Li, and S. Jha, “Locally differentially private heavy hitter identification,” TDSC, 2019.
- Z. Qin, Y. Yang, T. Yu, I. Khalil, X. Xiao, and K. Ren, “Heavy hitter estimation over set-valued data with local differential privacy,” in CCS, 2016.
- T. Wang, N. Li, and S. Jha, “Locally differentially private frequent itemset mining,” in S&P, 2018.
- C. Li, M. Hay, G. Miklau, and Y. Wang, “A data-and workload-aware algorithm for range queries under differential privacy,” PVLDB, 2014.
- G. Cormode, T. Kulkarni, and D. Srivastava, “Answering range queries under local differential privacy,” PVLDB, 2019.
- J. Yang, T. Wang, N. Li, X. Cheng, and S. Su, “Answering multi-dimensional range queries under local differential privacy,” PVLDB, 2020.
- X. He, G. Cormode, A. Machanavajjhala, C. M. Procopiuc, and D. Srivastava, “Dpt: differentially private trajectory synthesis using hierarchical reference systems,” PVLDB, 2015.
- R. Chen, H. Li, A. K. Qin, S. P. Kasiviswanathan, and H. Jin, “Private spatial data aggregation in the local setting,” in ICDE, 2016.
- Z. Qin, T. Yu, Y. Yang, I. Khalil, X. Xiao, and K. Ren, “Generating synthetic decentralized social graphs with local differential privacy,” in CCS, 2017.
- B. Avent, A. Korolova, D. Zeber, T. Hovden, and B. Livshits, “BLENDER: Enabling local search with a hybrid differential privacy model,” in USENIX Security, 2017.
- G. Cormode, T. Kulkarni, and D. Srivastava, “Marginal release under local differential privacy,” in SIGMOD, 2018.
- X. Ren, C.-M. Yu, W. Yu, S. Yang, X. Yang, J. A. McCann, and S. Y. Philip, “LoPub: High-dimensional crowdsourced data publication with local differential privacy,” TIFS, 2018.
- M. E. Gursoy, L. Liu, S. Truex, L. Yu, and W. Wei, “Utility-aware synthesis of differentially private and attack-resilient location traces,” in CCS, 2018.
- T. Wang, B. Ding, J. Zhou, C. Hong, Z. Huang, N. Li, and S. Jha, “Answering multi-dimensional analytical queries under local differential privacy,” in SIGMOD, 2019.
- Q. Ye, H. Hu, M. H. Au, X. Meng, and X. Xiao, “Towards locally differentially private generic graph metric estimation,” in CDE, 2020.
- Q. Ye, H. Hu, M. H. Au, X. Meng, and X. Xiao, “Lf-gdpr: A framework for estimating graph metrics with local differential privacy,” TKDE, 2020.
- T. Cunningham, G. Cormode, H. Ferhatosmanoglu, and D. Srivastava, “Real-world trajectory sharing with local differential privacy,” PVLDB, 2021.
- J. Shanthikumar and U. Sumita, “A central limit theorem for random sums of random variables,” Operations Research Letters, 1984.
- H. Fischer, A history of the central limit theorem: from classical to modern probability theory. Springer.
- X. Sun, Q. Ye, H. Hu, J. Duan, T. Wo, J. Xu, and R. Yang, “Ldprecover: Recovering frequencies from poisoning attacks against local differential privacy,” Tech. Rep. [Online]. Available: https://www.dropbox.com/scl/fo/88to91xn1u368u6dxoyje/h?rlkey=ttp8zopq0htbkrcbdyeuieqwm&dl=0
- R. Chen, H. Li, S. Kasiviswanathan, and H. Jin, “Private dataaggregation framework for untrusted servers,” Mar. 23 2021, uS Patent 10,956,603.
- M. Hay, V. Rastogi, G. Miklau, and D. Suciu, “Boosting the accuracy of differentially-private histograms through consistency,” PVLDB, 2010.
- W. Karush, “Minima of functions of several variables with inequalities as side constraints,” M. Sc. Dissertation. Dept. of Mathematics, Univ. of Chicago, 1939.
- H. W. Kuhn and A. W. Tucker, “Nonlinear programming,” in Traces and emergence of nonlinear programming, 2014.
- S. Ruggles, S. Flood, R. Goeken, M. Schouweiler, and M. Sobek, “Ipums usa: Version 12.0 [dataset]. minneapolis, mn: Ipums, 2022,” https://doi.org/10.18128/D010.V12.0, 2022.
- “San francisco fire department calls for service,” http://bit.ly/336sddL, 2023.
- T. T. Nguyên, X. Xiao, Y. Yang, S. C. Hui, H. Shin, and J. Shin, “Collecting and analyzing data from smart device users with local differential privacy,” arXiv preprint arXiv:1606.05053, 2016.
- X. Li, N. Li, W. Sun, N. Z. Gong, and H. Li, “Fine-grained poisoning attack to local differential privacy protocols for mean and variance estimation,” in USENIX Security’23, 2023, pp. 1739–1756.
- R. Du, Q. Ye, Y. Fu, H. Hu, J. Li, C. Fang, and J. Shi, “Differential aggregation against general colluding attackers,” in ICDE, 2023.