Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 178 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 38 tok/s Pro
GPT-5 High 40 tok/s Pro
GPT-4o 56 tok/s Pro
Kimi K2 191 tok/s Pro
GPT OSS 120B 445 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

A Neighbourhood-Aware Differential Privacy Mechanism for Static Word Embeddings (2309.10551v1)

Published 19 Sep 2023 in cs.LG, cs.AI, cs.CL, and cs.CR

Abstract: We propose a Neighbourhood-Aware Differential Privacy (NADP) mechanism considering the neighbourhood of a word in a pretrained static word embedding space to determine the minimal amount of noise required to guarantee a specified privacy level. We first construct a nearest neighbour graph over the words using their embeddings, and factorise it into a set of connected components (i.e. neighbourhoods). We then separately apply different levels of Gaussian noise to the words in each neighbourhood, determined by the set of words in that neighbourhood. Experiments show that our proposed NADP mechanism consistently outperforms multiple previously proposed DP mechanisms such as Laplacian, Gaussian, and Mahalanobis in multiple downstream tasks, while guaranteeing higher levels of privacy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Deep learning with differential privacy. In CCS. Association for Computing Machinery, New York, NY, USA, CCS ’16, pages 308–318.
  2. Exploring the privacy-preserving properties of word embeddings: Algorithmic validation study. JMIR 22(7):e18055.
  3. Privacy preserving text representation learning using bert. In Social, Cultural, and Behavioral Modeling: 14th International Conference, SBP-BRiMS 2021, Virtual Event, July 6–9, 2021, Proceedings. Springer-Verlag, Berlin, Heidelberg, pages 91–100.
  4. Geo-indistinguishability: Differential privacy for location-based systems. In CCS. Association for Computing Machinery, New York, NY, USA, CCS ’13, pages 901–914.
  5. A latent variable model approach to pmi-based word embeddings. TACL 4:385–399.
  6. A simple but tough-to-beat baseline for sentence embeddings. In Proc. of ICLR.
  7. Borja Balle and Yu-Xiang Wang. 2018. Improving the Gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In ICML. PMLR, volume 80 of Proceedings of Machine Learning Research, pages 394–403.
  8. Privacy preserving text representation learning. In Proceedings of the 30th ACM Conference on Hypertext and Social Media. Association for Computing Machinery, New York, NY, USA, HT ’19, pages 275–276.
  9. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5:135–146.
  10. Danushka Bollegala. 2022. Learning meta word embeddings by unsupervised weighted concatenation of source embeddings. In Proc. of the 31st International Joint Conference on Artificial Intelligence (IJCAI-ECAI).
  11. Danushka Bollegala and James O’Neill. 2022. A survey on word meta-embedding learning. In Proc. of the 31st International Joint Conference on Artificial Intelligence (IJCAI-ECAI).
  12. Using k𝑘kitalic_k-way Co-occurrences for Learning Word Embeddings. In Proc. of AAAI. pages 5037–5044.
  13. Multimodal distributional semantics. JAIR 49:1–47.
  14. SemEval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In SemEval. Association for Computational Linguistics, Vancouver, Canada, pages 1–14.
  15. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proc. of NAACL-HLT.
  16. Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy, volume 9. Foundations and Trends in TCS.
  17. Retrofitting word vectors to semantic lexicons. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Denver, Colorado, pages 1606–1615.
  18. Privacy- and utility-preserving textual analysis via calibrated multivariate perturbations. In WSDM. Association for Computing Machinery, New York, NY, USA, WSDM ’20, pages 178–186.
  19. Oluwaseyi Feyisetan and Shiva Kasiviswanathan. 2021. Private release of text embedding vectors. In TrustNLP. Association for Computational Linguistics, Online, pages 15–27.
  20. SimVerb-3500: A large-scale evaluation set of verb similarity. In EMNLP. Association for Computational Linguistics, Austin, Texas, pages 2173–2182.
  21. Hila Gonen and Yoav Goldberg. 2019. Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, pages 609–614.
  22. Accelerating large-scale inference with anisotropic vector quantization. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning. PMLR, volume 119 of Proceedings of Machine Learning Research, pages 3887–3896.
  23. Prakhar Gupta and Martin Jaggi. 2021. Obtaining better static word embeddings using contextual embedding models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, pages 5241–5253.
  24. Ivan Habernal. 2021. When differential privacy meets NLP: The devil is in the detail. In EMNLP. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, pages 1522–1528.
  25. Ivan Habernal. 2022. How reparametrization trick broke differentially-private text representation learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Dublin, Ireland, pages 771–777.
  26. Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics 41(4):665–695.
  27. Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In KDD 2004. pages 168–177.
  28. D. N. Joanes and C. A. Gill. 1998. Comparing measures of sample skewness and kurtosis. Journal of the Royal Statistical Society: Series D (The Statistician) 47(1):183–189.
  29. Masahiro Kaneko and Danushka Bollegala. 2019. Gender-preserving debiasing for pre-trained word embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, pages 1641–1650.
  30. Masahiro Kaneko and Danushka Bollegala. 2021. Dictionary-based debiasing of pre-trained word embeddings. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, Online, pages 212–223.
  31. What can we learn privately? In FOCS. IEEE.
  32. ADePT: Auto-encoder based differentially private text transformation. In EACL. Association for Computational Linguistics, Online, pages 2435–2439.
  33. Albert: A lite bert for self-supervised learning of language representations. In Proc. of ICLR.
  34. Towards robust and privacy-preserving text representations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Melbourne, Australia, pages 25–30.
  35. RoBERTa: A Robustly Optimized BERT Pretraining Approach.
  36. Differentially private representation for NLP: Formal guarantee and an empirical study on privacy and fairness. In EMNLP. Association for Computational Linguistics, Online, pages 2355–2365.
  37. Towards differentially private text representations. In SIGIR. Association for Computing Machinery, New York, NY, USA, pages 1813–1816.
  38. Efficient estimation of word representation in vector space. In ICLR.
  39. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proc. of ACL. pages 115–124.
  40. Glove: global vectors for word representation. In Proc. of EMNLP. pages 1532–1543.
  41. Privacy odometers and filters: Pay-as-you-go composition. In NeurIPS. Curran Associates, Inc., volume 29.
  42. Congzheng Song and Vitaly Shmatikov. 2019. Auditing data provenance in text-generation models. In KDD. Association for Computing Machinery, New York, NY, USA, KDD ’19, pages 196–206.
  43. Gabriel Stanovsky and Mark Hopkins. 2018. Spot the odd man out: Exploring the associative power of lexical resources. In EMNLP. Association for Computational Linguistics, Brussels, Belgium, pages 1533–1542.
  44. Evaluation of word vector representations by subspace alignment. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal, pages 2049–2054.
  45. Ulrike von Luxburg. 2007. A tutorial on spectral clustering. Statistics and Computing 17(4):395 – 416.
  46. Certified robustness to word substitution attack with differential privacy. In NAACL. Association for Computational Linguistics, Online, pages 1102–1112.
  47. Annotating expressions of opinions and emotions in language. In LREC.
  48. A differentially private text perturbation method using regularized mahalanobis metric. In PrivateNLP. Association for Computational Linguistics, Online, pages 7–17.
  49. Learning gender-neutral word embeddings. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pages 4847–4853.
  50. Private-knn: Practical differential privacy for computer vision. In CVPR. pages 11851–11859.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.