Almost Instance-optimal Clipping for Summation Problems in the Shuffle Model of Differential Privacy (2403.10116v2)
Abstract: Differentially private mechanisms achieving worst-case optimal error bounds (e.g., the classical Laplace mechanism) are well-studied in the literature. However, when typical data are far from the worst case, \emph{instance-specific} error bounds -- which depend on the largest value in the dataset -- are more meaningful. For example, consider the sum estimation problem, where each user has an integer $x_i$ from the domain ${0,1,\dots,U}$ and we wish to estimate $\sum_i x_i$. This has a worst-case optimal error of $O(U/\varepsilon)$, while recent work has shown that the clipping mechanism can achieve an instance-optimal error of $O(\max_i x_i \cdot \log\log U /\varepsilon)$. Under the shuffle model, known instance-optimal protocols are less communication-efficient. The clipping mechanism also works in the shuffle model, but requires two rounds: Round one finds the clipping threshold, and round two does the clipping and computes the noisy sum of the clipped data. In this paper, we show how these two seemingly sequential steps can be done simultaneously in one round using just $1+o(1)$ messages per user, while maintaining the instance-optimal error bound. We also extend our technique to the high-dimensional sum estimation problem and sparse vector aggregation (a.k.a. frequency estimation under user-level differential privacy).
- The fast johnson–lindenstrauss transform and approximate nearest neighbors. SIAM Journal on computing, 39(1):302–322, 2009.
- Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308–318, 2016.
- Instance-optimality in differential privacy via approximate inverse sensitivity mechanisms. Advances in neural information processing systems, 33, 2020.
- cpsgd: Communication-efficient and differentially-private distributed sgd. Advances in Neural Information Processing Systems, 31, 2018.
- Differentially private learning with adaptive clipping. Advances in Neural Information Processing Systems, 34:17455–17466, 2021.
- The privacy blanket of the shuffle model. In Advances in Cryptology–CRYPTO 2019: 39th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 18–22, 2019, Proceedings, Part II 39, pages 638–667. Springer, 2019.
- Private summation in the multi-message shuffle model. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pages 657–676, 2020.
- Coinpress: Practical private mean and covariance estimation. Advances in Neural Information Processing Systems, 33, 2020.
- Prochlo: Strong privacy for analytics in the crowd. In Proceedings of the 26th symposium on operating systems principles, pages 441–459, 2017.
- Practical secure aggregation for privacy-preserving machine learning. In proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 1175–1191, 2017.
- Private empirical risk minimization: Efficient algorithms and tight error bounds. In 2014 IEEE 55th annual symposium on foundations of computer science, pages 464–473. IEEE, 2014.
- Distributed differential privacy via shuffling. In Advances in Cryptology–EUROCRYPT 2019: 38th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Darmstadt, Germany, May 19–23, 2019, Proceedings, Part I 38, pages 375–403. Springer, 2019.
- R2t: Instance-optimal truncation for differentially private query evaluation with foreign keys. In Proceedings of the 2022 International Conference on Management of Data, pages 759–772, 2022.
- Subset-based instance optimality in private estimation. arXiv preprint arXiv:2303.01262, 2023.
- Unconditionally secure computation with reduced interaction. In Advances in Cryptology–EUROCRYPT 2016: 35th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Vienna, Austria, May 8-12, 2016, Proceedings, Part II 35, pages 420–447. Springer, 2016.
- The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
- Better than composition: How to answer multiple relational queries under differential privacy. Proceedings of the ACM on Management of Data, 1(2):1–26, 2023.
- Wei Dong and Ke Yi. Universal private estimators. In Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 195–206, 2023.
- Amplification by shuffling: From local to central differential privacy via anonymity. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2468–2479. SIAM, 2019.
- Shifted inverse: A general mechanism for monotonic functions under user differential privacy. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pages 1009–1022, 2022.
- On the power of multiple anonymous messages: Frequency estimation and selection in the shuffle model of differential privacy. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 463–488. Springer, 2021.
- Differentially private aggregation in the shuffle model: Almost central accuracy in almost a single message. In International Conference on Machine Learning, pages 3692–3701. PMLR, 2021.
- Private counting from anonymous messages: Near-optimal accuracy with vanishing communication overhead. In International Conference on Machine Learning, pages 3505–3514. PMLR, 2020.
- Private aggregation from fewer anonymous messages. In Advances in Cryptology–EUROCRYPT 2020: 39th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Zagreb, Croatia, May 10–14, 2020, Proceedings, Part II 30, pages 798–827. Springer, 2020.
- Scalable and differentially private distributed aggregation in the shuffled model. arXiv preprint arXiv:1906.08320, 2019.
- Instance-optimal mean estimation under differential privacy. Advances in Neural Information Processing Systems, 2021.
- Kaggle. San francisco city employee salary data. https://www.kaggle.com/datasets/kaggle/sf-salaries/data, 2014.
- Kaggle. Japan’s 100 million customs trade statistics since 1988. https://www.kaggle.com/datasets/zanjibar/100-million-data-csv, 2020.
- Kaggle. Mnist - digit recognizer dataset. https://www.kaggle.com/c/digit-recognizer/data, 2020.
- Kaggle. Monthly salary of public worker in brazil. https://www.kaggle.com/datasets/gustavomodelli/monthly-salary-of-public-worker-in-brazil, 2020.
- Kaggle. Ontario public sector salary 2019. https://www.kaggle.com/datasets/rajacsp/ontario, 2020.
- Privately learning high-dimensional distributions. In Conference on Learning Theory, pages 1853–1902. PMLR, 2019.
- Private mean estimation of heavy-tailed distributions. In Conference on Learning Theory, pages 2204–2235. PMLR, 2020.
- A primer on private statistics. arXiv preprint arXiv:2005.00010, 2020.
- Frank D McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 19–30, 2009.
- Learning differentially private recurrent language models. arXiv preprint arXiv:1710.06963, 2017.
- A picture of search. In Proceedings of the 1st international conference on Scalable information systems, 2006.
- Adaclip: Adaptive clipping for private sgd. arXiv preprint arXiv:1908.07643, 2019.
- Stochastic gradient descent with differentially private updates. In 2013 IEEE global conference on signal and information processing, pages 245–248. IEEE, 2013.
- Differentially private k-means with constant multiplicative error. Advances in Neural Information Processing Systems, 31, 2018.
- Uri Stemmer. Locally private k-means clustering. The Journal of Machine Learning Research, 22(1):7964–7993, 2021.
- Computing local sensitivities of counting queries with joins. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pages 479–494, 2020.
- Salil Vadhan. The complexity of differential privacy. In Tutorials on the Foundations of Cryptography, pages 347–450. Springer, 2017.