Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy (2405.01678v1)

Published 2 May 2024 in cs.CL

Abstract: The study of privacy-preserving NLP has gained rising attention in recent years. One promising avenue studies the integration of Differential Privacy in NLP, which has brought about innovative methods in a variety of application settings. Of particular note are $\textit{word-level Metric Local Differential Privacy (MLDP)}$ mechanisms, which work to obfuscate potentially sensitive input text by performing word-by-word $\textit{perturbations}$. Although these methods have shown promising results in empirical tests, there are two major drawbacks: (1) the inevitable loss of utility due to addition of noise, and (2) the computational expensiveness of running these mechanisms on high-dimensional word embeddings. In this work, we aim to address these challenges by proposing $\texttt{1-Diffractor}$, a new mechanism that boasts high speedups in comparison to previous mechanisms, while still demonstrating strong utility- and privacy-preserving capabilities. We evaluate $\texttt{1-Diffractor}$ for utility on several NLP tasks, for theoretical and task-based privacy, and for efficiency in terms of speed and memory. $\texttt{1-Diffractor}$ shows significant improvements in efficiency, while still maintaining competitive utility and privacy scores across all conducted comparative tests against previous MLDP mechanisms. Our code is made available at: https://github.com/sjmeis/Diffractor.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (Vienna, Austria) (CCS ’16). Association for Computing Machinery, New York, NY, USA, 308–318. https://doi.org/10.1145/2976749.2978318
  2. Privacy Preserving Text Representation Learning. In Proceedings of the 30th ACM Conference on Hypertext and Social Media (Hof, Germany) (HT ’19). Association for Computing Machinery, New York, NY, USA, 275–276. https://doi.org/10.1145/3342220.3344925
  3. ER-AE: Differentially Private Text Generation for Authorship Anonymization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, and Yichao Zhou (Eds.). Association for Computational Linguistics, Online, 3997–4007. https://doi.org/10.18653/v1/2021.naacl-main.314
  4. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21). 2633–2650. https://www.usenix.org/conference/usenixsecurity21/presentation/carlini-extracting
  5. TEM: High utility metric differential privacy on text. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM). SIAM, 883–890. https://doi.org/10.1137/1.9781611977653.ch99
  6. Broadening the scope of differential privacy using metrics. In Privacy enhancing technologies (Lecture notes in computer science), Emiliano De Cristofaro and Matthew Wright (Eds.). Springer, Springer Nature, United States, 82–102. https://doi.org/10.1007/978-3-642-39077-7_5 International Symposium on Privacy Enhancing Technologies (13th : 2013) ; Conference date: 10-07-2013 Through 12-07-2013.
  7. A Customized Text Sanitization Mechanism with Differential Privacy. In Findings of the Association for Computational Linguistics: ACL 2023, Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 5747–5758. https://doi.org/10.18653/v1/2023.findings-acl.355
  8. Yu-Hsin Chen and Jinho D. Choi. 2016. Character Identification on Multiparty Conversation: Identifying Mentions of Characters in TV Shows. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Association for Computational Linguistics, Los Angeles, 90–100. https://doi.org/10.18653/v1/W16-3612
  9. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
  10. DP-Forward: Fine-Tuning and Inference on Language Models with Differential Privacy in Forward Pass. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (¡conf-loc¿, ¡city¿Copenhagen¡/city¿, ¡country¿Denmark¡/country¿, ¡/conf-loc¿) (CCS ’23). Association for Computing Machinery, New York, NY, USA, 2665–2679. https://doi.org/10.1145/3576915.3616592
  11. Cynthia Dwork. 2006. Differential Privacy. In International Colloquium on Automata, Languages and Programming. https://api.semanticscholar.org/CorpusID:2565493
  12. Generalised differential privacy for text document processing. In Principles of Security and Trust: 8th International Conference, POST 2019, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019, Prague, Czech Republic, April 6–11, 2019, Proceedings 8. Springer International Publishing, 123–148. https://doi.org/10.1007/978-3-030-17138-4_6
  13. Research Challenges in Designing Differentially Private Text Generation Mechanisms. In The International FLAIRS Conference Proceedings, Vol. 34. https://doi.org/10.32473/flairs.v34i1.128461
  14. Privacy- and Utility-Preserving Textual Analysis via Calibrated Multivariate Perturbations. In Proceedings of the 13th International Conference on Web Search and Data Mining (Houston, TX, USA) (WSDM ’20). Association for Computing Machinery, New York, NY, USA, 178–186. https://doi.org/10.1145/3336191.3371856
  15. Leveraging hierarchical representations for preserving privacy and utility in text. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 210–219. https://doi.org/10.1109/ICDM.2019.00031
  16. Oluwaseyi Feyisetan and Shiva Kasiviswanathan. 2021. Private Release of Text Embedding Vectors. In Proceedings of the First Workshop on Trustworthy Natural Language Processing. Association for Computational Linguistics, Online, 15–27. https://doi.org/10.18653/v1/2021.trustnlp-1.3
  17. User Review Sites as a Resource for Large-Scale Sociolinguistic Studies. In Proceedings of the 24th International Conference on World Wide Web (Florence, Italy) (WWW ’15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 452–461. https://doi.org/10.1145/2736277.2741141
  18. Timour Igamberdiev and Ivan Habernal. 2023. DP-BART for Privatized Text Rewriting under Local Differential Privacy. In Findings of the Association for Computational Linguistics: ACL 2023, Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 13914–13934. https://doi.org/10.18653/v1/2023.findings-acl.874
  19. What can we learn privately? SIAM J. Comput. 40, 3 (2011), 793–826. https://doi.org/10.1137/090756090
  20. Differential Privacy in Natural Language Processing The Story So Far. In Proceedings of the Fourth Workshop on Privacy in Natural Language Processing. Association for Computational Linguistics, Seattle, United States, 1–11. https://doi.org/10.18653/v1/2022.privatenlp-1.1
  21. Gradual Release of Sensitive Data under Differential Privacy. Journal of Privacy and Confidentiality 7, 2 (Jan. 2017). https://doi.org/10.29012/jpc.v7i2.649
  22. Differentially Private Representation for NLP: Formal Guarantee and An Empirical Study on Privacy and Fairness. In Findings of the Association for Computational Linguistics: EMNLP 2020, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 2355–2365. https://doi.org/10.18653/v1/2020.findings-emnlp.213
  23. The Limits of Word Level Differential Privacy. In Findings of the Association for Computational Linguistics: NAACL 2022. Association for Computational Linguistics, Seattle, United States, 867–881. https://doi.org/10.18653/v1/2022.findings-naacl.65
  24. Sentence-level Privacy for Document Embeddings. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, Dublin, Ireland, 3367–3380. https://doi.org/10.18653/v1/2022.acl-long.238
  25. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013). https://doi.org/10.48550/arXiv.1301.3781
  26. Privacy Risks of General-Purpose Language Models. In 2020 IEEE Symposium on Security and Privacy (SP). 1314–1331. https://doi.org/10.1109/SP40000.2020.00095
  27. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1532–1543. https://doi.org/10.3115/v1/D14-1162
  28. Training Text-to-Text Transformers with Privacy Guarantees. In Findings of the Association for Computational Linguistics: ACL 2022. Association for Computational Linguistics, Dublin, Ireland, 2182–2193. https://doi.org/10.18653/v1/2022.findings-acl.171
  29. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, Hong Kong, China, 3982–3992. https://doi.org/10.18653/v1/D19-1410
  30. Selective Differential Privacy for Language Modeling. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir Meza Ruiz (Eds.). Association for Computational Linguistics, Seattle, United States, 2848–2859. https://doi.org/10.18653/v1/2022.naacl-main.205
  31. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. 4444–4451. http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14972
  32. Privacy and Utility Trade-Off for Textual Analysis via Calibrated Multivariate Perturbations. In Network and System Security, Mirosław Kutyłowski, Jun Zhang, and Chao Chen (Eds.). Springer International Publishing, Cham, 342–353. https://doi.org/10.1007/978-3-030-65745-1_20
  33. Locally Differentially Private Document Generation Using Zero Shot Prompting. In Findings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 8442–8457. https://doi.org/10.18653/v1/2023.findings-emnlp.566
  34. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Association for Computational Linguistics, Brussels, Belgium, 353–355. https://doi.org/10.18653/v1/W18-5446
  35. Differentially Private Recurrent Variational Autoencoder For Text Privacy Preservation. Mobile Networks and Applications (2023), 1–16. https://doi.org/10.1007/s11036-023-02096-9
  36. Benjamin Weggenmann and Florian Kerschbaum. 2018. SynTF: Synthetic and differentially private term frequency vectors for privacy-preserving text mining. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 305–314. https://doi.org/10.1145/3209978.3210008
  37. Density-aware differentially private textual perturbations using truncated Gumbel noise. In The International FLAIRS Conference Proceedings, Vol. 34. https://doi.org/10.32473/flairs.v34i1.128463
  38. A Differentially Private Text Perturbation Method Using Regularized Mahalanobis Metric. In Proceedings of the Second Workshop on Privacy in NLP. 7–17. https://doi.org/10.18653/v1/2020.privatenlp-1.2
  39. On a Utilitarian Approach to Privacy Preserving Text Generation. In Proceedings of the Third Workshop on Privacy in Natural Language Processing. 11–20. https://doi.org/10.18653/v1/2021.privatenlp-1.2
  40. Differential Privacy for Text Analytics via Natural Text Sanitization. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, 3853–3866. https://doi.org/10.18653/v1/2021.findings-acl.337
  41. Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 1321–1342. https://doi.org/10.18653/v1/2023.acl-long.74
  42. BERTScore: Evaluating Text Generation with BERT. In International Conference on Learning Representations. https://openreview.net/forum?id=SkeHuCVFDr
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets