Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sketches-based join size estimation under local differential privacy (2405.11419v1)

Published 19 May 2024 in cs.DB and cs.CR

Abstract: Join size estimation on sensitive data poses a risk of privacy leakage. Local differential privacy (LDP) is a solution to preserve privacy while collecting sensitive data, but it introduces significant noise when dealing with sensitive join attributes that have large domains. Employing probabilistic structures such as sketches is a way to handle large domains, but it leads to hash-collision errors. To achieve accurate estimations, it is necessary to reduce both the noise error and hash-collision error. To tackle the noise error caused by protecting sensitive join values with large domains, we introduce a novel algorithm called LDPJoinSketch for sketch-based join size estimation under LDP. Additionally, to address the inherent hash-collision errors in sketches under LDP, we propose an enhanced method called LDPJoinSketch+. It utilizes a frequency-aware perturbation mechanism that effectively separates high-frequency and low-frequency items without compromising privacy. The proposed methods satisfy LDP, and the estimation error is bounded. Experimental results show that our method outperforms existing methods, effectively enhancing the accuracy of join size estimation under LDP.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Y. Izenov, A. Datta, F. Rusu, and J. H. Shin, “COMPASS: online sketch-based query optimization for in-memory databases,” in SIGMOD ’21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021.   ACM, 2021, pp. 804–816. [Online]. Available: https://doi.org/10.1145/3448016.3452840
  2. P. Wang, Y. Qi, Y. Zhang, Q. Zhai, C. Wang, J. C. S. Lui, and X. Guan, “A memory-efficient sketch method for estimating high similarities in streaming sets,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019.   ACM, 2019, pp. 25–33. [Online]. Available: https://doi.org/10.1145/3292500.3330825
  3. A. S. R. Santos, A. Bessa, F. Chirigati, C. Musco, and J. Freire, “Correlation sketches for approximate join-correlation queries,” in SIGMOD ’21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021.   ACM, 2021, pp. 1531–1544. [Online]. Available: https://doi.org/10.1145/3448016.3458456
  4. A. Bessa, M. Daliri, J. Freire, C. Musco, C. Musco, A. S. R. Santos, and H. Zhang, “Weighted minwise hashing beats linear sketching for inner product estimation,” in Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2023, Seattle, WA, USA, June 18-23, 2023.   ACM, 2023, pp. 169–181. [Online]. Available: https://doi.org/10.1145/3584372.3588679
  5. S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. D. Smith, “What can we learn privately?” 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pp. 531–540, 2008. [Online]. Available: https://api.semanticscholar.org/CorpusID:1935
  6. T. Wang, J. Blocki, N. Li, and S. Jha, “Locally differentially private protocols for frequency estimation,” in USENIX Security Symposium, 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:10051640
  7. M. Zhang, S. Lin, and L. Yin, “Local differentially private frequency estimation based on learned sketches,” Inf. Sci., vol. 649, p. 119667, 2023. [Online]. Available: https://doi.org/10.1016/j.ins.2023.119667
  8. J. C. Duchi, M. J. Wainwright, and M. I. Jordan, “Minimax optimal procedures for locally private estimation,” Journal of the American Statistical Association, vol. 113, pp. 182 – 201, 2016. [Online]. Available: https://api.semanticscholar.org/CorpusID:15762329
  9. “Learning with privacy at scale differential,” 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:43986173
  10. Ú. Erlingsson, A. Korolova, and V. Pihur, “Rappor: Randomized aggregatable privacy-preserving ordinal response,” Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, 2014. [Online]. Available: https://api.semanticscholar.org/CorpusID:6855746
  11. G. C. Fanti, V. Pihur, and Ú. Erlingsson, “Building a rappor with the unknown: Privacy-preserving learning of associations and data dictionaries,” Proceedings on Privacy Enhancing Technologies, vol. 2016, pp. 41 – 61, 2015. [Online]. Available: https://api.semanticscholar.org/CorpusID:9001011
  12. B. Ding, J. Kulkarni, and S. Yekhanin, “Collecting telemetry data privately,” in Neural Information Processing Systems, 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:3277268
  13. M. Xu, B. Ding, T. Wang, and J. Zhou, “Collecting and analyzing data jointly from multiple services under local differential privacy,” Proceedings of the VLDB Endowment, vol. 13, pp. 2760 – 2772, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:221375864
  14. R. B. Christensen, S. R. Pandey, and P. Popovski, “Semi-private computation of data similarity with applications to data valuation and pricing,” IEEE Trans. Inf. Forensics Secur., vol. 18, pp. 1978–1988, 2023. [Online]. Available: https://doi.org/10.1109/TIFS.2023.3259879
  15. J. Bater, Y. Park, X. He, X. Wang, and J. Rogers, “SAQE: practical privacy-preserving approximate query processing for data federations,” Proc. VLDB Endow., vol. 13, no. 11, pp. 2691–2705, 2020. [Online]. Available: http://www.vldb.org/pvldb/vol13/p2691-bater.pdf
  16. J. Ock, T. Lee, and S. Kim, “Privacy-preserving approximate query processing with differentially private generative models,” in IEEE International Conference on Big Data, BigData 2023, Sorrento, Italy, December 15-18, 2023.   IEEE, 2023, pp. 6242–6244. [Online]. Available: https://doi.org/10.1109/BigData59044.2023.10386956
  17. G. Cormode, S. Maddock, and C. Maple, “Frequency estimation under local differential privacy,” Proc. VLDB Endow., vol. 14, pp. 2046–2058, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:232427949
  18. S. Ganguly, P. B. Gibbons, Y. Matias, and A. Silberschatz, “Bifocal sampling for skew-resistant join size estimation,” in ACM SIGMOD Conference, 1996. [Online]. Available: https://api.semanticscholar.org/CorpusID:2892590
  19. C. Estan and J. F. Naughton, “End-biased samples for join cardinality estimation,” 22nd International Conference on Data Engineering (ICDE’06), pp. 20–20, 2006. [Online]. Available: https://api.semanticscholar.org/CorpusID:5265860
  20. Y. E. Ioannidis and S. Christodoulakis, “Optimal histograms for limiting worst-case error propagation in the size of join results,” ACM Trans. Database Syst., vol. 18, pp. 709–748, 1993. [Online]. Available: https://api.semanticscholar.org/CorpusID:16703047
  21. N. Alon, P. B. Gibbons, Y. Matias, and M. Szegedy, “Tracking join and self-join sizes in limited storage,” in ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 1999. [Online]. Available: https://api.semanticscholar.org/CorpusID:1650858
  22. G. Cormode and M. N. Garofalakis, “Sketching streams through the net: Distributed approximate query tracking,” in Very Large Data Bases Conference, 2005. [Online]. Available: https://api.semanticscholar.org/CorpusID:3402807
  23. H. Chen, Z. Wang, Y. Li, R. Yang, Y. Zhao, R. Zhou, and K. Zheng, “Deep learning-based bloom filter for efficient multi-key membership testing,” Data Science and Engineering, vol. 8, pp. 234–246, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:261499850
  24. S. Ganguly, M. N. Garofalakis, and R. Rastogi, “Processing data-stream join aggregates using skimmed sketches,” in International Conference on Extending Database Technology, 2004. [Online]. Available: https://api.semanticscholar.org/CorpusID:11330374
  25. S. Ganguly, D. Kesh, and C. Saha, “Practical algorithms for tracking database join sizes,” in Foundations of Software Technology and Theoretical Computer Science, 2005. [Online]. Available: https://api.semanticscholar.org/CorpusID:1195913
  26. F. Wang, Q. Chen, Y. Li, T. Yang, Y. Tu, L. Yu, and B. Cui, “Joinsketch: A sketch algorithm for accurate and unbiased inner-product estimation,” Proceedings of the ACM on Management of Data, vol. 1, pp. 1 – 26, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:259077177
  27. S. Aydöre, W. Brown, M. Kearns, K. Kenthapadi, L. Melis, A. Roth, and A. Siva, “Differentially private query release through adaptive projection,” in International Conference on Machine Learning, 2021.
  28. T. Wang, N. Li, and S. Jha, “Locally differentially private frequent itemset mining,” 2018 IEEE Symposium on Security and Privacy (SP), pp. 127–143, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:50787144
  29. A. Triastcyn and B. Faltings, “Bayesian differential privacy for machine learning,” in International Conference on Machine Learning, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:199472691
  30. H. Jiang, J. Pei, D. Yu, J. Yu, B. Gong, and X. Cheng, “Applications of differential privacy in social network analysis: A survey,” IEEE Transactions on Knowledge and Data Engineering, vol. 35, pp. 108–127, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:235083200
  31. C. Dwork, F. McSherry, K. Nissim, and A. D. Smith, “Calibrating noise to sensitivity in private data analysis,” in Theory of Cryptography Conference, 2006. [Online]. Available: https://api.semanticscholar.org/CorpusID:2468323
  32. V. V. Williams, Y. Xu, Z. Xu, and R. Zhou, “New bounds for matrix multiplication: from alpha to omega,” in Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2024.

Summary

We haven't generated a summary for this paper yet.