Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mean Estimation with User-Level Privacy for Spatio-Temporal IoT Datasets (2401.15906v7)

Published 29 Jan 2024 in cs.CR, cs.IT, math.IT, and stat.AP

Abstract: This paper considers the problem of the private release of sample means of speed values from traffic datasets. Our key contribution is the development of user-level differentially private algorithms that incorporate carefully chosen parameter values to ensure low estimation errors on real-world datasets, while ensuring privacy. We test our algorithms on ITMS (Intelligent Traffic Management System) data from an Indian city, where the speeds of different buses are drawn in a potentially non-i.i.d. manner from an unknown distribution, and where the number of speed samples contributed by different buses is potentially different. We then apply our algorithms to large synthetic datasets, generated based on the ITMS data. Here, we provide theoretical justification for the observed performance trends, and also provide recommendations for the choices of algorithm subroutines that result in low estimation errors. Finally, we characterize the best performance of pseudo-user creation-based algorithms on worst-case datasets via a minimax approach; this then gives rise to a novel procedure for the creation of pseudo-users, which optimizes the worst-case total estimation error. The algorithms discussed in the paper are readily applicable to general spatio-temporal IoT datasets for releasing a differentially private mean of a desired value.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. L. Sweeney, “Weaving technology and policy together to maintain confidentiality,” Journal of Law, Medicine & Ethics, vol. 25, no. 2–3, p. 98–110, 1997.
  2. A. Narayanan and V. Shmatikov, “Robust de-anonymization of large sparse datasets,” in 2008 IEEE Symposium on Security and Privacy (sp 2008), 2008, pp. 111–125.
  3. C. Whong. (2014) Foiling nyc’s taxi trip data. [Online]. Available: https://chriswhong.com/open-data/foil_nyc_taxi/
  4. V. Pandurangan. (2014) On taxis and rainbow tables: Lessons for researchers and governments from nyc’s improperly anonymized taxi logs. [Online]. Available: https://blogs.lse.ac.uk/impactofsocialsciences/2014/07/16/nyc-improperly-anonymized-taxi-logs-pandurangan/
  5. C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” Theory of Cryptography, p. 265–284, 2006.
  6. D. A. N. Levy, Z. Sun, K. Amin, S. Kale, A. Kulesza, M. Mohri, and A. T. Suresh, “Learning with user-level privacy,” in Advances in Neural Information Processing Systems, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds., 2021. [Online]. Available: https://openreview.net/forum?id=G1jmxFOtY_
  7. A. J. George, L. Ramesh, A. Vikram Singh, and H. Tyagi, “Continual Mean Estimation Under User-Level Privacy,” arXiv e-prints, p. arXiv:2212.09980, Dec. 2022.
  8. (Uber Technologies Inc.) H3: Hexagonal hierarchical geospatial indexing system. [Online]. Available: https://h3geo.org/
  9. Q. Geng, P. Kairouz, S. Oh, and P. Viswanath, “The staircase mechanism in differential privacy,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 7, pp. 1176–1184, 2015.
  10. Q. Geng and P. Viswanath, “The optimal noise-adding mechanism in differential privacy,” IEEE Transactions on Information Theory, vol. 62, no. 2, pp. 925–951, 2016.
  11. C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211–407, 2014. [Online]. Available: http://dx.doi.org/10.1561/0400000042
  12. E. Pauwels, “Statistics and optimization in high dimensions,” Lecture notes. [Online]. Available: https://www.math.univ-toulouse.fr/~epauwels/M2RI/session1.pdf
  13. A. D. Smith, “Privacy-preserving statistical estimation with optimal convergence rates,” in STOC ’11: Proceedings of the forty-third annual ACM symposium on Theory of computing, 2011. [Online]. Available: https://dl.acm.org/doi/10.1145/1993636.1993743
  14. K. Amin, A. Kulesza, A. Munoz, and S. Vassilvtiskii, “Bounding user contributions: A bias-variance trade-off in differential privacy,” in Proceedings of the 36th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97.   PMLR, 09–15 Jun 2019, pp. 263–271. [Online]. Available: https://proceedings.mlr.press/v97/amin19a.html
  15. L. Wasserman and S. Zhou, “A statistical framework for differential privacy,” Journal of the American Statistical Association, vol. 105, no. 489, pp. 375–389, 2010. [Online]. Available: http://www.jstor.org/stable/29747034
  16. J. C. Duchi, M. J. Wainwright, and M. I. Jordan, “Minimax optimal procedures for locally private estimation,” Journal of the American Statistical Association, vol. 113, pp. 182 – 201, 2016. [Online]. Available: https://api.semanticscholar.org/CorpusID:15762329
  17. P. Kairouz, S. Oh, and P. Viswanath, “The composition theorem for differential privacy,” IEEE Transactions on Information Theory, vol. 63, no. 6, pp. 4037–4049, 2017.
  18. T. Steinke, “Composition of Differential Privacy & Privacy Amplification by Subsampling,” arXiv e-prints, p. arXiv:2210.00597, Oct. 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com