Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Privacy Regularization: Joint Privacy-Utility Optimization in Language Models (2103.07567v2)

Published 12 Mar 2021 in cs.LG, cs.CL, and cs.CR

Abstract: Neural LLMs are known to have a high capacity for memorization of training samples. This may have serious privacy implications when training models on user content such as email correspondence. Differential privacy (DP), a popular choice to train models with privacy guarantees, comes with significant costs in terms of utility degradation and disparate impact on subgroups of users. In this work, we introduce two privacy-preserving regularization methods for training LLMs that enable joint optimization of utility and privacy through (1) the use of a discriminator and (2) the inclusion of a triplet-loss term. We compare our methods with DP through extensive evaluation. We show the advantages of our regularizers with favorable utility-privacy trade-off, faster training with the ability to tap into existing optimization approaches, and ensuring uniform treatment of under-represented subgroups.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Fatemehsadat Mireshghallah (26 papers)
  2. Huseyin A. Inan (23 papers)
  3. Marcello Hasegawa (4 papers)
  4. Victor Rühle (18 papers)
  5. Taylor Berg-Kirkpatrick (106 papers)
  6. Robert Sim (25 papers)
Citations (37)

Summary

We haven't generated a summary for this paper yet.