Differential Privacy with Random Projections and Sign Random Projections (2306.01751v2)
Abstract: In this paper, we develop a series of differential privacy (DP) algorithms from a family of random projections (RP) for general applications in machine learning, data mining, and information retrieval. Among the presented algorithms, iDP-SignRP is remarkably effective under the setting of individual differential privacy'' (iDP), based on sign random projections (SignRP). Also, DP-SignOPORP considerably improves existing algorithms in the literature under the standard DP setting, using
one permutation + one random projection'' (OPORP), where OPORP is a variant of the celebrated count-sketch method with fixed-length binning and normalization. Without taking signs, among the DP-RP family, DP-OPORP achieves the best performance. Our key idea for improving DP-RP is to take only the signs, i.e., $sign(x_j) = sign\left(\sum_{i=1}p u_i w_{ij}\right)$, of the projected data. The intuition is that the signs often remain unchanged when the original data ($u$) exhibit small changes (according to the neighbor'' definition in DP). In other words, the aggregation and quantization operations themselves provide good privacy protections. We develop a technique called
smooth flipping probability'' that incorporates this intuitive privacy benefit of SignRPs and improves the standard DP bit flipping strategy. Based on this technique, we propose DP-SignOPORP which satisfies strict DP and outperforms other DP variants based on SignRP (and RP), especially when $\epsilon$ is not very large (e.g., $\epsilon = 5\sim10$). Moreover, if an application scenario accepts individual DP, then we immediately obtain an algorithm named iDP-SignRP which achieves excellent utilities even at small~$\epsilon$ (e.g., $\epsilon<0.5$).
- Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 308–318, Vienna, Austria, 2016.
- Dimitris Achlioptas. Database-friendly random projections: Johnson-lindenstrauss with binary coins. J. Comput. Syst. Sci., 66(4):671–687, 2003.
- cpSGD: Communication-efficient and differentially-private distributed SGD. In Advances in Neural Information Processing Systems (NeurIPS), pages 7575–7586, Montréal, Canada, 2018.
- Finding the needle in a haystack: On the automatic identification of accessibility user reviews. In Proceedings of the Conference on Human Factors in Computing Systems (CHI), pages 387:1–387:15, Virtual Event / Yokohama, Japan, 2021.
- Improving the gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In Proceedings of the 35th International Conference on Machine Learning (ICML), pages 403–412, Stockholmsmässan, Stockholm, Sweden, 2018.
- Privacy-aware recommendation with private-attribute protection using adversarial learning. In Proceedings of the Thirteenth ACM International Conference on Web Search and Data Mining (WSDM), pages 34–42, Houston, TX, USA, 2020.
- Applying differential privacy to matrix factorization. In Proceedings of the 9th ACM Conference on Recommender Systems (RecSys), pages 107–114, Vienna, Austria, 2015.
- Random projection in dimensionality reduction: Applications to image and text data. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 245–250, San Francisco, CA, 2001.
- The johnson-lindenstrauss transform itself preserves differential privacy. In Proceedings of the 53rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 410–419, New Brunswick, NJ, 2012.
- Practical privacy: the SuLQ framework. In Proceedings of the Twenty-fourth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), pages 128–138, Baltimore, MD, 2005.
- 1-bit compressive sensing. In Proceedings of the 42nd Annual Conference on Information Sciences and Systems (CISS), pages 16–21, Princeton, NJ, 2008.
- Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory, 52(2):489–509, 2006.
- Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):27, 2011.
- Finding frequent items in data streams. Theor. Comput. Sci., 312(1):3–15, 2004.
- Moses S Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing (STOC), pages 380–388, Montreal, Canada, 2002.
- Privacy-preserving logistic regression. In Advances in Neural Information Processing Systems (NIPS), pages 289–296, Vancouver, Canada, 2008.
- Differentially private empirical risk minimization. J. Mach. Learn. Res., 12:1069–1109, 2011.
- Reading wikipedia to answer open-domain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), pages 1870–1879, Vancouver, Canada, 2017.
- Compressing Neural Networks with the Hashing Trick. In Proceedings of the 32nd International Conference on Machine Learning (ICML), pages 2285–2294, Lille, France, 2015.
- Privacy at scale: Local differential privacy in practice. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD), pages 1655–1658, Houston, TX, 2018a.
- Marginal release under local differential privacy. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD), pages 131–146, Houston, TX, 2018b.
- Support-vector networks. Mach. Learn., 20(3):273–297, 1995.
- Large-scale malware classification using random projections and neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3422–3426, Vancouver, Canada, 2013.
- Sanjoy Dasgupta. Experiments with random projection. In Proceedings of the 16th Conference in Uncertainty in Artificial Intelligence (UAI), pages 143–151, Stanford, CA, 2000.
- Random projection trees and low dimensional manifolds. In Proceedings of the 40th Annual ACM Symposium on Theory of Computing (STOC), pages 537–546, Victoria, Canada, 2008.
- Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the Twentieth Annual Symposium on Computational Geometry (SCG), pages 253–262, Brooklyn, NY, 2004.
- Sok: Differential privacies. Proc. Priv. Enhancing Technol., 2020(2):288–313, 2020.
- Order-invariant cardinality estimators are differentially private. In Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, 2022.
- Collecting telemetry data privately. In Advances in Neural Information Processing Systems (NIPS), pages 3571–3580, Long Beach, CA, 2017.
- Gaussian differential privacy. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(1):3–37, 2022.
- Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 123–130, 2008.
- David L. Donoho. Compressed sensing. IEEE Trans. Inf. Theory, 52(4):1289–1306, 2006.
- Differential privacy and robust statistics. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing (STOC), pages 371–380, Bethesda, MD, 2009.
- The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211–407, 2014.
- Concentrated differential privacy. arXiv preprint arXiv:1603.01887, 2016.
- Calibrating noise to sensitivity in private data analysis. In Proceedings of the Third Theory of Cryptography Conference (TCC), pages 265–284, New York, NY, 2006.
- RAPPOR: randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 1054–1067, Scottsdale, AZ, 2014.
- Distances release with differential privacy in tree and grid graph. In IEEE International Symposium on Information Theory (ISIT), pages 2190–2195, 2022.
- Private graph all-pairwise-shortest-path distance release with improved error rate. In Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, 2022.
- Improved convergence of differential private sgd with gradient clipping. In Proceedings of the Eleventh International Conference on Learning Representations (ICLR), Kigali, Rwanda, 2023.
- Private coresets. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing (STOC), pages 361–370, Bethesda, MD, 2009.
- Random projection for high dimensional data clustering: A cluster ensemble approach. In Proceedings of the Twentieth International Conference (ICML), pages 186–193, Washington, DC, 2003.
- Learning the structure of manifolds using random projections. In Advances in Neural Information Processing Systems (NIPS), pages 473–480, Vancouver, Canada, 2007.
- An algorithm for finding nearest neighbors. IEEE Transactions on Computers, 24:1000–1006, 1975.
- Minimax-optimal privacy-preserving sparse PCA in distributed systems. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1589–1598, Playa Blanca, Lanzarote, Canary Islands, Spain, 2018.
- Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM, 42(6):1115–1145, 1995.
- Differentially private combinatorial optimization. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1106–1125, Austin, TX, 2010.
- Fedsketch: Communication-efficient and private federated learning via sketching. arXiv preprint arXiv:2008.04975, 2020.
- Justin Hsu Marco Gaboardi Andreas Haeberlen and Sanjeev Khanna. Differential privacy: An economic method for choosing epsilon. arXiv preprint arXiv:1402.3329, 2014.
- Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on the Theory of Computing (STOC), pages 604–613, Dallas, TX, 1998.
- Differentially private matrix completion revisited. In Proceedings of the 35th International Conference on Machine Learning (ICML), pages 2220–2229, Stockholmsmässan, Stockholm, Sweden, 2018.
- Evaluating differentially private machine learning in practice. In Proceedings of the 28th USENIX Security Symposium (USENIX Security), pages 1895–1912, Santa Clara, CA, 2019.
- Extensions of Lipschitz mapping into Hilbert space. Contemporary Mathematics, 26:189–206, 1984.
- Extremal mechanisms for local differential privacy. In Advances in Neural Information Processing Systems (NeurIPS), pages 2879–2887, Montreal, Canada, 2014.
- Analyzing graphs with node differential privacy. In Proceedings of the 10th Theory of Cryptography Conference (TCC), pages 457–476, Tokyo, Japan, 2013.
- The use of differential privacy for census data and its impact on redistricting: The case of the 2020 us census. Science advances, 7(41):eabk3283, 2021.
- Privacy via the johnson-lindenstrauss transform. J. Priv. Confidentiality, 5(1), 2013.
- One-bit compressive sensing with norm estimation. IEEE Trans. Inf. Theory, 62(5):2748–2758, 2016.
- Learning multiple layers of features from tiny images. Technical Report, University of Toronto, 2009.
- Adaptive estimation of a quadratic functional by model selection. The Annals of Statistics, pages 1302–1338, 2000.
- Gradient-based learning applied to document recognition. Proc. IEEE, 86(11):2278–2324, 1998.
- Random subspace for binary codes learning in large scale image retrieval. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 1031–1034, Gold Coast, Australia, 2014.
- Ping Li. Sign-full random projections. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI), pages 4205–4212, Honolulu, HI, 2019.
- OPORP: One permutation + one random projection. arXiv preprint arXiv:2302.03505, 2023a.
- GCWSNet: Generalized consistent weighted sampling for scalable and accurate training of neural networks. In Proceedings of the 31st ACM International Conference on Information and Knowledge Management (CIKM), Atlanta, GA, 2022.
- Very sparse random projections. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), pages 287–296, Philadelphia, PA, 2006.
- Coding for random projections. In Proceedings of the 31th International Conference on Machine Learning (ICML), pages 676–684, Beijing, China, 2014.
- Generalization error analysis of quantized compressive learning. In Advances in Neural Information Processing Systems (NeurIPS), pages 15124–15134, Vancouver, Canada, 2019a.
- Random projections with asymmetric quantization. In Advances in Neural Information Processing Systems (NeurIPS), pages 10857–10866, Vancouver, Canada, 2019b.
- Quantization algorithms for random Fourier features. In Proceedings of the 38th International Conference on Machine Learning (ICML), pages 6369–6380, Virtual Event, 2021.
- Differentially private one permutation hashing and bin-wise consistent weighted sampling. arXiv preprint, 2023b.
- Ilya Mironov. Rényi differential privacy. In Proceedings of the 30th IEEE Computer Security Foundations Symposium (CSF), pages 263–275, Santa Barbara, CA, 2017.
- Smooth sensitivity and sampling in private data analysis. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing (STOC), pages 75–84, San Diego, CA, 2007.
- Donald Bruce Owen. A table of normal integrals: A table. Communications in Statistics-Simulation and Computation, 9(4):389–419, 1980.
- Failing loudly: An empirical study of methods for detecting dataset shift. In Advances in Neural Information Processing Systems (NeurIPS), pages 1394–1406, Vancouver, Canada, 2019.
- FetchSGD: Communication-efficient federated learning with sketching. In Proceedings of the 37th International Conference on Machine Learning (ICML), pages 8253–8265, Virtual Event, 2020.
- In defense of minhash over simhash. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS), pages 886–894, Reykjavik, Iceland, 2014.
- Federated reconstruction: Partially local federated learning. In Advances in Neural Information Processing Systems (NeurIPS), virtual, 2021.
- On the trade-off between bit depth and number of samples for a basic approach to structured signal recovery from b-bit quantized linear measurements. IEEE Trans. Inf. Theory, 64(6):4159–4178, 2018.
- The flajolet-martin sketch itself preserves differential privacy: Private counting with minimal space. In Advances in Neural Information Processing Systems, virtual, 2020.
- Individual differential privacy: A utility-preserving formulation of differential privacy guarantees. IEEE Trans. Inf. Forensics Secur., 12(6):1418–1429, 2017.
- Nina Mesing Stausholm. Improved differentially private euclidean distance approximation. In Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS), pages 42–56, Virtual Event, China, 2021.
- Sparse projection oblique randomer forests. J. Mach. Learn. Res., 21:104:1–104:39, 2020.
- Santosh S Vempala. The random projection method, volume 65. American Mathematical Soc., 2005.
- Stanley L Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69, 1965.
- Federated learning with differential privacy: Algorithms and performance analysis. IEEE Trans. Inf. Forensics Secur., 15:3454–3469, 2020.
- Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML), pages 1113–1120, Montreal, Canada, 2009.
- DEMO-Net: Degree-specific graph neural networks for node and graph classification. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), pages 406–415, Anchorage, AK, 2019.
- Differentially private histogram publication. VLDB J., 22(6):797–822, 2013.
- Locality sensitive teaching. In Advances in Neural Information Processing Systems (NeurIPS), pages 18049–18062, virtual, 2021.
- Bayesian differential privacy on correlated data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 747–762, Melbourne, Australia, 2015.
- Optimal estimator for unlabeled linear regression. In Proceedings of the 37th International Conference on Machine Learning (ICML), pages 11153–11162, Virtual Event, 2020.
- Functional mechanism: Regression analysis under differential privacy. Proc. VLDB Endow., 5(11):1364–1375, 2012.
- Privbayes: private data release via bayesian networks. In International Conference on Management of Data (SIGMOD), pages 1423–1434, Snowbird, UT, 2014.
- Kernelized few-shot object detection with efficient integral aggregation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19185–19194, New Orleans, LA, 2022.
- Graph embedding for recommendation against attribute inference attacks. In Proceedings of the Web Conference (WWW), pages 3002–3014, Virtual Event / Ljubljana, Slovenia, 2021.
- Dynamic malware analysis with feature engineering and feature learning. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), pages 1210–1217, New York, NY, 2020.
- Differentially private linear sketches: Efficient implementations and applications. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Compressed sensing with quantized measurements. IEEE Signal Process. Lett., 17(2):149–152, 2010.