DP-MLM: Differentially Private Text Rewriting Using Masked Language Models (2407.00637v1)

Published 30 Jun 2024 in cs.CL

Abstract: The task of text privatization using Differential Privacy has recently taken the form of $\textit{text rewriting}$, in which an input text is obfuscated via the use of generative (large) LLMs. While these methods have shown promising results in the ability to preserve privacy, these methods rely on autoregressive models which lack a mechanism to contextualize the private rewriting process. In response to this, we propose $\textbf{DP-MLM}$, a new method for differentially private text rewriting based on leveraging masked LLMs (MLMs) to rewrite text in a semantically similar $\textit{and}$ obfuscated manner. We accomplish this with a simple contextualization technique, whereby we rewrite a text one token at a time. We find that utilizing encoder-only MLMs provides better utility preservation at lower $\varepsilon$ levels, as compared to previous methods relying on larger models with a decoder. In addition, MLMs allow for greater customization of the rewriting mechanism, as opposed to generative approaches. We make the code for $\textbf{DP-MLM}$ public and reusable, found at https://github.com/sjmeis/DPMLM .

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

GitHub - sjmeis/DPMLM: Repository containing the code for the ACL Findings paper "DP-MLM: Differentially Private Text Rewriting Using Masked Language Models" (2 stars)

DP-MLM: Differentially Private Text Rewriting Using Masked Language Models (2407.00637v1)

Summary

Related Papers

GitHub