Bayesian Frequency Estimation Under Local Differential Privacy With an Adaptive Randomized Response Mechanism (2405.07020v2)
Abstract: Frequency estimation plays a critical role in many applications involving personal and private categorical data. Such data are often collected sequentially over time, making it valuable to estimate their distribution online while preserving privacy. We propose AdOBEst-LDP, a new algorithm for adaptive, online Bayesian estimation of categorical distributions under local differential privacy (LDP). The key idea behind AdOBEst-LDP is to enhance the utility of future privatized categorical data by leveraging inference from previously collected privatized data. To achieve this, AdOBEst-LDP uses a new adaptive LDP mechanism to collect privatized data. This LDP mechanism constrains its output to a \emph{subset} of categories that `predicts' the next user's data. By adapting the subset selection process to the past privatized data via Bayesian estimation, the algorithm improves the utility of future privatized data. To quantify utility, we explore various well-known information metrics, including (but not limited to) the Fisher information matrix, total variation distance, and information entropy. For Bayesian estimation, we utilize \emph{posterior sampling} through stochastic gradient Langevin dynamics, a computationally efficient approximate Markov chain Monte Carlo (MCMC) method. We provide a theoretical analysis showing that (i) the posterior distribution of the category probabilities targeted with Bayesian estimation converges to the true probabilities even for approximate posterior sampling, and (ii) AdOBEst-LDP eventually selects the optimal subset for its LDP mechanism with high probability if posterior sampling is performed exactly. We also present numerical results to validate the estimation accuracy of AdOBEst-LDP. Our comparisons show its superior performance against non-adaptive and semi-adaptive competitors across different privacy levels and distributional parameters.
- Unified lower bounds for interactive high-dimensional estimation under information constraints. In Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S., editors, Advances in Neural Information Processing Systems, volume 36, pages 51133–51165, New Orleans, US. Curran Associates, Inc.
- Statistic selection and MCMC for differentially private Bayesian estimation. Statistics and Computing, 32(5):66.
- Fisher information under local differential privacy. IEEE Journal on Selected Areas in Information Theory, 1(3):645–659.
- Local differential privacy in graph neural networks: a reconstruction approach. In Proceedings of the 2024 SIAM International Conference on Data Mining (SDM), pages 1–9, Texas, US. SIAM, SIAM.
- Statistical inference. Cengage Learning.
- Sample-and-threshold differential privacy: Histograms and applications. In International Conference on Artificial Intelligence and Statistics, pages 1420–1431, Valencia, Spain. PMLR.
- Marginal release under local differential privacy. In Proceedings of the 2018 International Conference on Management of Data, pages 131–146.
- Felip: A local differentially private approach to frequency estimation on multidimensional datasets. In EDBT, pages 671–683.
- Ahead: adaptive hierarchical decomposition for range query under local differential privacy. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pages 1266–1288.
- Dwork, C. (2006). Differential privacy. In International colloquium on automata, languages, and programming, pages 1–12. Springer.
- Applications of the van Trees Inequality: A Bayesian Cramér-Rao Bound. Bernoulli, 1(1/2):59–79.
- Calibrate: Frequency estimation and heavy hitter identification with local differential privacy via incorporating prior knowledge. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications, pages 2008–2016. IEEE.
- Locally private Gaussian estimation. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
- Extremal mechanisms for local differential privacy. The Journal of Machine Learning Research, 17(1):492–542.
- What can we learn privately? SIAM Journal on Computing, 40(3):793–826.
- An adaptive mechanism for accurate query answering under differential privacy. Proc. VLDB Endow., 5(6):514–525.
- Fisher information as a utility metric for frequency estimation under local differential privacy. In Proceedings of the 21st Workshop on Privacy in the Electronic Society, pages 41–53.
- On approximate Thompson sampling with Langevin algorithms. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org.
- A workload-adaptive mechanism for linear queries under local differential privacy. Proc. VLDB Endow., 13(12):1905–1918.
- Hoeffding’s inequality for sums of dependent random variables. Mediterranean Journal of Mathematics, 14(6):243.
- Differentially private grids for geospatial data. In 2013 IEEE 29th international conference on data engineering (ICDE), pages 757–768. IEEE.
- A tutorial on thompson sampling. Foundations and Trends in Machine Learning, 11(1):1–96.
- Steinberger, L. (2024). Efficiency in local differential privacy.
- Accurately estimating frequencies of relations with relation privacy preserving in decentralized networks. IEEE Transactions on Mobile Computing, 23(05):6408–6422.
- Mutual information optimally local private discrete distribution estimation.
- Locally private set-valued data analyses: Distribution and heavy hitters estimation. IEEE Transactions on Mobile Computing [preprint], pages 1–14.
- Locally differentially private protocols for frequency estimation. In 26th USENIX Security Symposium (USENIX Security 17), pages 729–745.
- Locally differentially private frequency estimation with consistency. In 27th Annual Network and Distributed System Security Symposium, NDSS 2020. Cited by: 33; All Open Access, Bronze Open Access, Green Open Access.
- Nonparametric extensions of randomized response for private confidence sets. In International Conference on Machine Learning, pages 36748–36789. PMLR.
- Aaa: an adaptive mechanism for locally differential private mean estimation.
- Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, page 681–688, Madison, WI, USA. Omnipress.
- Yıldırım, S. (2024). Differentially private online bayesian estimation with adaptive truncation. Turkish Journal of Electrical Engineering and Computer Sciences, 32(2):34–50.
- Calm: Consistent adaptive local marginal for marginal release under local differential privacy. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 212–229.
- Hadamard encoding based frequent itemset mining under local differential privacy. Journal of Computer Science and Technology, 38(6):1403–1422.
- Heavy hitter identification over large-domain set-valued data with local differential privacy. IEEE Transactions on Information Forensics and Security, 19:414–426.