Fast Private Kernel Density Estimation via Locality Sensitive Quantization (2307.01877v1)
Abstract: We study efficient mechanisms for differentially private kernel density estimation (DP-KDE). Prior work for the Gaussian kernel described algorithms that run in time exponential in the number of dimensions $d$. This paper breaks the exponential barrier, and shows how the KDE can privately be approximated in time linear in $d$, making it feasible for high-dimensional data. We also present improved bounds for low-dimensional data. Our results are obtained through a general framework, which we term Locality Sensitive Quantization (LSQ), for constructing private KDE mechanisms where existing KDE approximation techniques can be applied. It lets us leverage several efficient non-private KDE methods -- like Random Fourier Features, the Fast Gauss Transform, and Locality Sensitive Hashing -- and ``privatize'' them in a black-box manner. Our experiments demonstrate that our resulting DP-KDE mechanisms are fast and accurate on large datasets in both high and low dimensions.
- The bernstein mechanism: Function release under differential privacy. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
- Algorithms and hardness for linear algebra on geometric graphs. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pp. 541–552. IEEE, 2020.
- Dimension reduction in kernel spaces from locality-sensitive hashing. Maniscript, also available in Andoni A.,“Nearest neighbor search: the old, the new, and the impossible”, PhD thesis, Massachusetts Institute of Technology, 2009.
- Differentially private sparse vectors with low error, optimal space, and fast access. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pp. 1223–1236, 2021.
- Efficient density evaluation for smooth kernels. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pp. 615–626. IEEE, 2018.
- Space and time efficient kernel density estimation in high dimensions. Advances in Neural Information Processing Systems (NeurIPS), 2019.
- Faster kernel matrix algebra via density estimation. In International Conference on Machine Learning (ICML), 2021.
- Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Computers and electronics in agriculture, 24(3):131–151, 1999. URL https://archive.ics.uci.edu/ml/datasets/covertype.
- The johnson-lindenstrauss transform itself preserves differential privacy. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science (FOCS), pp. 410–419. IEEE, 2012.
- A learning theory approach to noninteractive database privacy. Journal of the ACM (JACM), 60(2):1–25, 2013.
- Universal classes of hash functions. In Proceedings of the ninth annual ACM symposium on Theory of computing, pp. 106–112, 1977.
- Hashing-based-estimators for kernel density in high dimensions. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pp. 1032–1043. IEEE, 2017.
- New york city taxi fare prediction, 2018. URL https://kaggle.com/competitions/new-york-city-taxi-fare-prediction.
- On adaptive distance estimation. Advances in Neural Information Processing Systems, 33:11178–11190, 2020.
- Lsh-preserving functions and their applications. Journal of the ACM (JACM), 62(5):1–25, 2015.
- Sub-linear race sketches for approximate kernel density estimation on streaming data. In Proceedings of The Web Conference 2020, pp. 1739–1749, 2020.
- A one-pass distributed and private sketch for kernel sums with applications to machine learning at scale. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), pp. 3252–3265, 2021.
- Privacy-preserving synthetic location data in the real world. In 17th International Symposium on Spatial and Temporal Databases, pp. 23–33, 2021.
- Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference (TCC), pp. 265–284. Springer, 2006.
- The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
- Lossless compression of efficient private local randomizers. In International Conference on Machine Learning (ICML), 2021.
- A fast algorithm for particle simulations. Journal of computational physics, 73(2):325–348, 1987.
- The fast gauss transform. SIAM Journal on Scientific and Statistical Computing, 12(1):79–94, 1991.
- Iterative constructions and private data release. In Theory of cryptography conference, pp. 339–356. Springer, 2012.
- Hall, R. New Statistical Applications for Differential Privacy. PhD thesis, Carnegie Mellon University, 2013.
- Differential privacy for functions and functional data. Journal of Machine Learning Research, 14(Feb):703–727, 2013.
- A multiplicative weights mechanism for privacy-preserving data analysis. In 2010 IEEE 51st annual symposium on foundations of computer science, pp. 61–70. IEEE, 2010.
- Kernel methods in machine learning. The annals of statistics, 36(3):1171–1220, 2008.
- Privacy-aware synthesizing for crowdsourced data. In IJCAI, pp. 2542–2548, 2019.
- Indyk, P. Stable distributions, pseudorandom generators, embeddings, and data stream computation. Journal of the ACM (JACM), 53(3):307–323, 2006.
- Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing (STOC), 1998.
- Using the fisher kernel method to detect remote protein homologies. In ISMB, volume 99, pp. 149–158, 1999.
- Embedding l _p^m into l _1^n. 1982.
- Discrepancy, coresets, and sketches in machine learning. In Conference on Learning Theory, pp. 1975–1993. PMLR, 2019.
- Sequential kernel herding: Frank-wolfe optimization for particle filtering. In Artificial Intelligence and Statistics, pp. 544–552. PMLR, 2015.
- Towards a learning theory of cause-effect inference. In International Conference on Machine Learning, pp. 1452–1461. PMLR, 2015.
- Nikolov, A. Private query release via the johnson-lindenstrauss transform. ACM-SIAM Symposium on Discrete Algorithms (SODA), 2023.
- Improved utility analysis of private countsketch. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543, 2014. URL https://nlp.stanford.edu/projects/glove/.
- Near-optimal coresets of kernel density estimates. Discrete & Computational Geometry, 63(4):867–887, 2020.
- Random features for large-scale kernel machines. Advances in neural information processing systems, 20, 2007.
- Kernel methods for pattern analysis. Cambridge university press, 2004.
- Rehashing kernel evaluation in high dimensions. In International Conference on Machine Learning, pp. 5789–5798. PMLR, 2019.
- Impact of hba1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed research international, 2014, 2014. URL https://archive.ics.uci.edu/ml/datasets/diabetes+130-us+hospitals+for+years+1999-2008.
- Differentially private data releasing for smooth queries. The Journal of Machine Learning Research, 17(1):1779–1820, 2016.