NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human (2406.03749v1)

Published 6 Jun 2024 in cs.CL

Abstract: Increasing concerns about privacy leakage issues in academia and industry arise when employing NLP models from third-party providers to process sensitive texts. To protect privacy before sending sensitive data to those models, we suggest sanitizing sensitive text using two common strategies used by humans: i) deleting sensitive expressions, and ii) obscuring sensitive details by abstracting them. To explore the issues and develop a tool for text rewriting, we curate the first corpus, coined NAP^2, through both crowdsourcing and the use of LLMs. Compared to the prior works based on differential privacy, which lead to a sharp drop in information utility and unnatural texts, the human-inspired approaches result in more natural rewrites and offer an improved balance between privacy protection and data utility, as demonstrated by our extensive experiments.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human (2406.03749v1)

Summary

Related Papers

Tweets