2000 character limit reached
Differentially Private Language Models Benefit from Public Pre-training
Published 13 Sep 2020 in cs.LG, cs.CL, and cs.CR | (2009.05886v2)
Abstract: Language modeling is a keystone task in natural language processing. When training a LLM on sensitive information, differential privacy (DP) allows us to quantify the degree to which our private data is protected. However, training algorithms which enforce differential privacy often lead to degradation in model quality. We study the feasibility of learning a LLM which is simultaneously high-quality and privacy preserving by tuning a public base model on a private corpus. We find that DP fine-tuning boosts the performance of LLMs in the private domain, making the training of such models possible.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.