Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning (2406.14322v3)

Published 20 Jun 2024 in cs.CL, cs.CR, and cs.LG

Abstract: LLMs have emerged as powerful tools for tackling complex tasks across diverse domains, but they also raise privacy concerns when fine-tuned on sensitive data due to potential memorization. While differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit, current evaluations on LLMs mostly treat each example (text record) as the privacy unit. This leads to uneven user privacy guarantees when contributions per user vary. We therefore study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users. We present a systematic evaluation of user-level DP for LLM fine-tuning on natural language generation tasks. Focusing on two mechanisms for achieving user-level DP guarantees, Group Privacy and User-wise DP-SGD, we investigate design choices like data selection strategies and parameter tuning for the best privacy-utility tradeoff.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Lynn Chua (16 papers)
  2. Badih Ghazi (78 papers)
  3. Yangsibo Huang (40 papers)
  4. Pritish Kamath (48 papers)
  5. Daogao Liu (34 papers)
  6. Pasin Manurangsi (127 papers)
  7. Amer Sinha (11 papers)
  8. Chiyuan Zhang (57 papers)
  9. Ravi Kumar (146 papers)
Citations (5)