Papers
Topics
Authors
Recent
Search
2000 character limit reached

Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages

Published 16 Apr 2024 in cs.DS, cs.CR, cs.IT, and cs.LG | (2404.10201v2)

Abstract: We study the problem of private vector mean estimation in the shuffle model of privacy where $n$ users each have a unit vector $v{(i)} \in\mathbb{R}d$. We propose a new multi-message protocol that achieves the optimal error using $\tilde{\mathcal{O}}\left(\min(n\varepsilon2,d)\right)$ messages per user. Moreover, we show that any (unbiased) protocol that achieves optimal error requires each user to send $Ω(\min(n\varepsilon2,d)/\log(n))$ messages, demonstrating the optimality of our message complexity up to logarithmic factors. Additionally, we study the single-message setting and design a protocol that achieves mean squared error $\mathcal{O}(dn{d/(d+2)}\varepsilon{-4/(d+2)})$. Moreover, we show that any single-message protocol must incur mean squared error $Ω(dn{d/(d+2)})$, showing that our protocol is optimal in the standard setting where $\varepsilon = Θ(1)$. Finally, we study robustness to malicious users and show that malicious users can incur large additive error with a single shuffler.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 308–318, 2016.
  2. Fast optimal locally private mean estimation via random projections. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  3. Optimal algorithms for mean estimation under local differential privacy. In International Conference on Machine Learning, ICML, USA, pages 1046–1056, 2022.
  4. Communication complexity in locally private distribution estimation and heavy hitters. In Proceedings of the 36th International Conference on Machine Learning, ICML, pages 51–60, 2019.
  5. The privacy blanket of the shuffle model. In Advances in Cryptology - CRYPTO 2019 - 39th Annual International Cryptology Conference, Proceedings, Part II, pages 638–667, 2019.
  6. Private summation in the multi-message shuffle model. In CCS ’20: 2020 ACM SIGSAC Conference on Computer and Communications Security, pages 657–676, 2020.
  7. Protection against reconstruction and its applications in private federated learning. CoRR, abs/1812.00984, 2018.
  8. Prochlo: Strong privacy for analytics in the crowd. In Proceedings of the 26th Symposium on Operating Systems Principles, pages 441–459, 2017.
  9. On the round complexity of the shuffle model. In Theory of Cryptography Conference, pages 683–712. Springer, 2020.
  10. Distributed private data analysis: Simultaneously solving how and what. In Advances in Cryptology - CRYPTO 2008, 28th Annual International Cryptology Conference. Proceedings, pages 451–468, 2008.
  11. Local, private, efficient protocols for succinct histograms. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC, pages 127–135, 2015.
  12. Private empirical risk minimization: Efficient algorithms and tight error bounds. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS, pages 464–473, 2014.
  13. Prio: Private, robust, and scalable computation of aggregate statistics. In Aditya Akella and Jon Howell, editors, 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2017, Boston, MA, USA, March 27-29, 2017, pages 259–282. USENIX Association, 2017.
  14. On distributed differential privacy and counting distinct elements. arXiv:2009.09604 [cs.CR], 2020.
  15. Shuffle private stochastic convex optimization. In The Tenth International Conference on Learning Representations, ICLR, 2022.
  16. Breaking the communication-privacy-accuracy trilemma. In Proceedings of the 33rd Annual Conference on Advances in Neural Information Processing Systems (NeurIPS), 2020.
  17. Differentially private empirical risk minimization. J. Mach. Learn. Res., 12:1069–1109, 2011.
  18. Privacy amplification via compression: Achieving the optimal privacy-accuracy-communication trade-off in distributed mean estimation. arXiv:2304.01541 [stat.ML], 2023.
  19. Optimal lower bound for differentially private multi-party aggregation. In Algorithms - ESA 2012 - 20th Annual European Symposium. Proceedings, pages 277–288, 2012.
  20. Distributed differential privacy via shuffling. In Advances in Cryptology - EUROCRYPT 2019 - 38th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Proceedings, Part I, pages 375–403, 2019.
  21. Manipulation attacks in local differential privacy. Journal of Privacy and Confidentiality, 11(1), Feb. 2021.
  22. The limits of pan privacy and shuffle privacy for learning and estimation. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1081–1094, 2021.
  23. Privacy pass: Bypassing internet challenges anonymously. Proc. Priv. Enhancing Technol., 2018(3):164–180, 2018.
  24. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, Third Theory of Cryptography Conference, TCC, Proceedings, pages 265–284, 2006.
  25. Calibrating noise to sensitivity in private data analysis. J. Priv. Confidentiality, 7(3):17–51, 2016.
  26. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211–407, 2014.
  27. Lower bounds for locally private estimation via communication complexity. In Conference on Learning Theory, COLT, pages 1161–1191, 2019.
  28. John C. Duchi. Introductory lectures on stochastic convex optimization. In The Mathematics of Data, IAS/Park City Mathematics Series. American Mathematical Society, 2018.
  29. Minimax optimal procedures for locally private estimation. CoRR, abs/1604.02390, 2016.
  30. Amplification by shuffling: From local to central differential privacy via anonymity. In Proceedings of the Thirtieth ACM-SIAM Symposium on Discrete Algorithms (SODA), 2019.
  31. Statistical query algorithms for mean vector estimation and stochastic convex optimization. Math. Oper. Res., 46(3):912–945, 2021.
  32. Building a RAPPOR with the unknown: Privacy-preserving learning of associations and data dictionaries. Proc. Priv. Enhancing Technol., 2016(3):41–61, 2016.
  33. Lossless compression of efficient private local randomizers. In Proceedings of the 38th International Conference on Machine Learning, volume 139, pages 3208–3219. PMLR, 2021.
  34. Differentially private aggregation in the shuffle model: Almost central accuracy in almost a single message. In Proceedings of the 38th International Conference on Machine Learning, ICML, pages 3692–3701, 2021.
  35. Private aggregation from fewer anonymous messages. In Advances in Cryptology - EUROCRYPT 2020 - 39th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Proceedings, Part II, pages 798–827, 2020.
  36. Rate-Limited Token Issuance Protocol. Internet-Draft draft-ietf-privacypass-rate-limit-tokens-01, Internet Engineering Task Force, March 2023. Work in Progress.
  37. Cryptography from anonymity. In 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2006), USA, Proceedings, pages 239–248. IEEE Computer Society, 2006.
  38. What can we learn privately? SIAM J. Comput., 40(3):793–826, 2011.
  39. Uncertainty principles and vector quantization. IEEE Trans. Inf. Theory, 56(7):3491–3501, 2010.
  40. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS, pages 1273–1282, 2017.
  41. Collecting and analyzing data from smart device users with local differential privacy. CoRR, abs/1606.05053, 2016.
  42. Pine: Efficient norm-bound verification for secret-shared vectors, 2023.
  43. Applying the shuffle model of differential privacy to vector aggregation. In Holger Pirk and Thomas Heinis, editors, Proceedings of the The British International Conference on Databases, volume 3163 of CEUR Workshop Proceedings, pages 50–59, 2021.
  44. Aggregation and transformation of vector-valued messages in the shuffle model of differential privacy. IEEE Trans. Inf. Forensics Secur., 17:612–627, 2022.
  45. Personalized privacy-preserving frequent itemset mining using randomized response. The Scientific World Journal, 2014, 2014.
  46. Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 1310–1321. ACM, 2015.
  47. Kunal Talwar. Differential secrecy for distributed data and applications to robust differentially secure vector summation. In L. Elisa Celis, editor, 3rd Symposium on Foundations of Responsible Computing, FORC 2022, June 6-8, 2022, Cambridge, MA, USA, volume 218 of LIPIcs, pages 7:1–7:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022.
  48. Oblivious HTTP. Internet-Draft draft-ietf-ohai-ohttp-08, Internet Engineering Task Force, March 2023. Work in Progress.
  49. Optimal schemes for discrete distribution estimation under locally differential privacy. IEEE Trans. Inf. Theory, 64(8):5662–5676, 2018.
  50. Locally differentially private sparse vector aggregation. In 43rd IEEE Symposium on Security and Privacy, SP, pages 422–439, 2022.
Citations (4)

Summary

  • The paper demonstrates that achieving optimal error rates in vector mean estimation within the shuffle model requires high per-user message complexity.
  • It introduces a protocol that reaches an error rate of d/ε² using approximately min(nε², d) messages per user, closely matching central model performance.
  • It establishes theoretical lower bounds and discusses practical challenges, including the protocol's robustness against malicious users.

Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages

Introduction and Background

The paper "Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages" (2404.10201) addresses the problem of differentially private vector mean estimation within the shuffle model of privacy. This model is particularly pertinent to federated learning scenarios, in which large-scale data originating from multiple users must be aggregated while ensuring the privacy of each individual's data. The paper's contributions are significant in the context of differential privacy (DP), especially under the local DP (LDP) model considerations, where high noise typically introduces diminished accuracy.

Differential privacy provides a mathematical framework to ensure data privacy, with the shuffle model—a specific approach where a trusted intermediary shuffles user data—offering a compromise between the highly accurate central model and the highly private but less accurate local model. The paper investigates the trade-offs between message complexity and the achievable level of privacy, focusing on whether optimal error rates can be realized with minimal communication overhead by leveraging the shuffle model.

Main Contributions

This paper's primary contribution lies in demonstrating that achieving optimal error rates in vector mean estimation within the shuffle model requires a significant communication load, quantified in terms of message complexity. Specifically, the authors establish a protocol that achieves optimal error rates of dε2\frac{d}{\varepsilon^2} using min(nε2,d)\min(n\varepsilon^2, d) messages per user, largely matching the performance of the central DP model aside from logarithmic factors. This result underscores the inherent communication trade-offs in obtaining accurate estimations while guaranteeing privacy.

Further, the authors show that attaining optimal error with any unbiased protocol necessitates a per-user message complexity of Ω(min(nε2,d)/log(n))\Omega(\min(n\varepsilon^2, d)/\log(n)). The established lower bound highlights that optimal rates under the shuffle model's constraints inherently demand significant communication overhead.

Additionally, the paper explores the single-message setting, presenting a protocol achieving mean squared error of dnd/(d+2)ε4/(d+2)dn^{d/(d+2)}\varepsilon^{-4/(d+2)}. This protocol is shown to be optimal under standard settings where ε=Θ(1)\varepsilon = \Theta(1). The robustness of protocols to malicious users is also addressed, elucidating the potential vulnerabilities when a single shuffler is involved.

Theoretical and Practical Implications

The paper's findings have notable implications both theoretically and practically. Theoretically, the research elaborates on the complex interactions between message complexity, privacy guarantees, and estimation accuracy in distributed systems under the shuffle model. The derivation of lower bounds effectively delineates the frontier of what is achievable given the current protocols and conceptual frameworks in DP.

Practically, understanding these trade-offs informs the design of more efficient privacy-preserving data analytics systems. Particularly in data-sensitive applications like federated learning for mobile devices, healthcare, or finance, optimizing the balance between communication costs and privacy levels will be crucial. As devices typically operate under constrained resources, minimizing communication while maintaining accuracy is essential for the widespread application of DP systems.

The robustness analysis against potential manipulations by malicious users provides a pragmatic perspective on deploying these protocols in real-world settings, where adversarial behaviors might exploit protocol vulnerabilities.

Conclusion

The paper presents a comprehensive examination of message complexity requirements for private vector mean estimation in the shuffle model, advancing both theoretical understanding and practical considerations of differential privacy applications. Future research directions may explore novel means to further reduce message complexity while retaining or even enhancing accuracy and robustness, thereby broadening the applicability of differential privacy in diverse, large-scale distributed systems.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.