Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Comparative Analysis of Word-Level Metric Differential Privacy: Benchmarking The Privacy-Utility Trade-off (2404.03324v1)

Published 4 Apr 2024 in cs.CL

Abstract: The application of Differential Privacy to Natural Language Processing techniques has emerged in relevance in recent years, with an increasing number of studies published in established NLP outlets. In particular, the adaptation of Differential Privacy for use in NLP tasks has first focused on the $\textit{word-level}$, where calibrated noise is added to word embedding vectors to achieve "noisy" representations. To this end, several implementations have appeared in the literature, each presenting an alternative method of achieving word-level Differential Privacy. Although each of these includes its own evaluation, no comparative analysis has been performed to investigate the performance of such methods relative to each other. In this work, we conduct such an analysis, comparing seven different algorithms on two NLP tasks with varying hyperparameters, including the $\textit{epsilon ($\varepsilon$)}$ parameter, or privacy budget. In addition, we provide an in-depth analysis of the results with a focus on the privacy-utility trade-off, as well as open-source our implementation code for further reproduction. As a result of our analysis, we give insight into the benefits and challenges of word-level Differential Privacy, and accordingly, we suggest concrete steps forward for the research field.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Local differential privacy on metric spaces: optimizing the trade-off with utility. In 2018 IEEE 31st Computer Security Foundations Symposium (CSF), pages 262–267. IEEE.
  2. Programming language techniques for differential privacy. ACM SIGLOG News, 3(1):34–53.
  3. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, page 4356–4364, Red Hook, NY, USA. Curran Associates Inc.
  4. Extracting training data from large language models. In USENIX Security Symposium, volume 6.
  5. BRR: Preserving privacy of text data efficiently on device. ArXiv, abs/2107.07923.
  6. TEM: high utility metric differential privacy on text. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM), pages 883–890. SIAM.
  7. Broadening the scope of differential privacy using metrics. In Privacy Enhancing Technologies: 13th International Symposium, PETS 2013, Bloomington, IN, USA, July 10-12, 2013. Proceedings 13, pages 82–102. Springer.
  8. Cynthia Dwork. 2006. Differential privacy. In Automata, Languages and Programming: 33rd International Colloquium, ICALP 2006, Venice, Italy, July 10-14, 2006, Proceedings, Part II 33, pages 1–12. Springer.
  9. Generalised differential privacy for text document processing. In Principles of Security and Trust: 8th International Conference, POST 2019, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019, Prague, Czech Republic, April 6–11, 2019, Proceedings 8, pages 123–148. Springer International Publishing.
  10. Privacy-and utility-preserving textual analysis via calibrated multivariate perturbations. In Proceedings of the 13th International Conference on Web Search and Data Mining, pages 178–186.
  11. Leveraging hierarchical representations for preserving privacy and utility in text. In 2019 IEEE International Conference on Data Mining (ICDM), pages 210–219.
  12. Oluwaseyi Feyisetan and Shiva Kasiviswanathan. 2021. Private release of text embedding vectors. In Proceedings of the First Workshop on Trustworthy Natural Language Processing, pages 15–27, Online. Association for Computational Linguistics.
  13. Ivan Habernal. 2021. When differential privacy meets NLP: The devil is in the detail. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1522–1528, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  14. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
  15. Differentially private natural language models: Recent advances and future directions. arXiv preprint arXiv:2301.09112.
  16. DP-rewrite: Towards reproducibility and transparency in differentially private text rewriting. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2927–2933, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
  17. Input perturbation: A new paradigm between central and local differential privacy. arXiv preprint arXiv:2002.08570.
  18. What can we learn privately? SIAM Journal on Computing, 40(3):793–826.
  19. Differential privacy in natural language processing the story so far. In Proceedings of the Fourth Workshop on Privacy in Natural Language Processing, pages 1–11.
  20. ADePT: Auto-encoder based differentially private text transformation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2435–2439, Online. Association for Computational Linguistics.
  21. Differentially private representation for NLP: Formal guarantee and an empirical study on privacy and fairness. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2355–2365, Online. Association for Computational Linguistics.
  22. Towards differentially private text representations. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’20, page 1813–1816, New York, NY, USA. Association for Computing Machinery.
  23. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA. Association for Computational Linguistics.
  24. Fair NLP models with differentially private text encoders. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 6913–6930, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  25. Black is to criminal as caucasian is to police: Detecting and removing multiclass bias in word embeddings. In Proceedings of NAACL-HLT, pages 615–621.
  26. The limits of word level differential privacy. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 867–881.
  27. Frank McSherry and Kunal Talwar. 2007. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), pages 94–103. IEEE.
  28. Text embeddings reveal (almost) as much as text. arXiv preprint arXiv:2310.06816.
  29. Privacy risks of general-purpose language models. In 2020 IEEE Symposium on Security and Privacy (SP), pages 1314–1331.
  30. Bias in word embeddings. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20, page 446–457, New York, NY, USA. Association for Computing Machinery.
  31. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar. Association for Computational Linguistics.
  32. CAPE: Context-aware private embeddings for private language learning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7970–7978, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  33. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
  34. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pages 3–18. IEEE.
  35. Congzheng Song and Ananth Raghunathan. 2020. Information leakage in embedding models. In Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, pages 377–390.
  36. Privacy and utility trade-off for textual analysis via calibrated multivariate perturbations. In Network and System Security, pages 342–353, Cham. Springer International Publishing.
  37. Investigating the impact of pre-trained word embeddings on memorization in neural networks. In Text, Speech, and Dialogue: 23rd International Conference, TSD 2020, Brno, Czech Republic, September 8–11, 2020, Proceedings, pages 273–281.
  38. Benjamin Weggenmann and Florian Kerschbaum. 2018. SynTF: Synthetic and differentially private term frequency vectors for privacy-preserving text mining. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pages 305–314.
  39. Density-aware differentially private textual perturbations using truncated gumbel noise. The International FLAIRS Conference Proceedings, 34.
  40. On a utilitarian approach to privacy preserving text generation. In Proceedings of the Third Workshop on Privacy in Natural Language Processing, pages 11–20, Online. Association for Computational Linguistics.
  41. A differentially private text perturbation method using regularized mahalanobis metric. In Proceedings of the Second Workshop on Privacy in NLP, pages 7–17.
  42. Differential privacy for text analytics via natural text sanitization. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3853–3866, Online. Association for Computational Linguistics.
  43. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.
  44. Character-level convolutional networks for text classification. In NIPS.
Citations (6)

Summary

We haven't generated a summary for this paper yet.