Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Keep It Private: Unsupervised Privatization of Online Text (2405.10260v1)

Published 16 May 2024 in cs.CL and cs.AI

Abstract: Authorship obfuscation techniques hold the promise of helping people protect their privacy in online communications by automatically rewriting text to hide the identity of the original author. However, obfuscation has been evaluated in narrow settings in the NLP literature and has primarily been addressed with superficial edit operations that can lead to unnatural outputs. In this work, we introduce an automatic text privatization framework that fine-tunes a LLM via reinforcement learning to produce rewrites that balance soundness, sense, and privacy. We evaluate it extensively on a large-scale test set of English Reddit posts by 68k authors composed of short-medium length texts. We study how the performance changes among evaluative conditions including authorial profile length and authorship detection strategy. Our method maintains high text quality according to both automated metrics and human evaluation, and successfully evades several automated authorship attacks.

Keeping It Private: Authorship Obfuscation with LLMs

Introduction

So, you're browsing Reddit, maybe posting some insightful comments, and all of a sudden, you start to get a little paranoid—what if someone figures out who you are? This isn't just a worry for whistleblowers or people with a high online presence; it can affect anyone. Authorship obfuscation is all about using tech to rewrite your text in a way that keeps your identity hidden. This paper introduces us to a new framework called "Keep it Private" that uses LLMs to do exactly that.

The Need for Authorship Obfuscation

Online privacy is critical. Even if you're using a pseudonym, stylistic markers in your writing can still give away your identity. Think Sherlock Holmes, but instead of solving crimes, he's just piecing together your internet history. Previous attempts at authorship obfuscation have been kind of basic—you know, rule-based systems and such. These approaches often end up making your text sound weird. This new method aims to keep things natural while providing privacy.

How It Works

Reinforcement Learning for Text Privatization

At the core of this new method is reinforcement learning (RL). The idea here is to fine-tune pre-trained LLMs to generate text that balances between keeping your identity private and making sense. Here's a simplified look at the process:

  • Input Text: Your original post or comment.
  • Output Text: A modified version that hides your identity but retains the meaning.
  • Training Mechanism: The system uses Self-Critical Sequence Training (SCST), which is a kind of optimization technique. Essentially, the model tries multiple rewrites and picks the best one based on a reward system.

Reward Components

These rewards cover three main areas:

  1. Privacy: Measures how well the output text hides your identity.
  2. Meaning Preservation: Ensures that your original message is not lost.
  3. Soundness: Keeps the output text grammatically acceptable and natural-sounding.

Results

So, does it actually work? The researchers tested this on a large set of Reddit posts, involving 68,000 authors. Here's a snapshot of what they found:

  • Privacy: The new method managed to fool various authorship attribution and verification models significantly well, scoring better than previous methods like rule-based systems and round-trip machine translation.
  • Meaning Preservation: The output text maintained high similarity with the original text in terms of meaning. The scores were high across automated metrics and human evaluations.
  • Soundness: The generated text was also well-formed and coherent according to both automatic judgment and human evaluators.

Implications

This new framework is practical and highly relevant for anyone concerned about maintaining online privacy. For researchers, it opens up new avenues to explore how advanced LLMs can be fine-tuned for specific tasks like this. On the practical side, it could be integrated into online platforms to help users remain anonymous while sharing content.

Future Developments

Looking ahead, this research can be expanded to:

  • Different Languages: Applying the method to languages other than English.
  • Diverse Text Lengths and Types: Testing on longer articles or different forms of writing.
  • Robustness Against Various Adversaries: Improving the model to counter a broad range of authorship detection techniques.

Conclusion

In a nutshell, this "Keep it Private" framework is a promising step forward in authorship obfuscation. It's like having a smart, undercover writer tweaking your content to keep your secrets safe. Whether you're a journalist, activist, or just someone wanting to keep a low profile online, this new approach offers a practical solution that keeps your words—yours.

And that's a wrap! This new method may not make you invisible, but it certainly makes you a lot harder to find. Happy posting!

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Rigot Afsaneh. 2021. Why Online Anonymity Matters.
  2. A multifaceted framework to evaluate evasion, content preservation, and misattribution in authorship obfuscation techniques. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2391–2406, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  3. Nicholas Andrews and Marcus Bishop. 2019. Learning invariant representations of social media users. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1684–1695, Hong Kong, China. Association for Computational Linguistics.
  4. The pushshift reddit dataset. Proceedings of the International AAAI Conference on Web and Social Media, 14(1):830–839.
  5. Heuristic authorship obfuscation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1098–1108, Florence, Italy. Association for Computational Linguistics.
  6. ER-AE: Differentially private text generation for authorship anonymization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3997–4007, Online. Association for Computational Linguistics.
  7. Deep communicating agents for abstractive summarization. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1662–1675, New Orleans, Louisiana. Association for Computational Linguistics.
  8. William B. Dolan and Chris Brockett. 2005. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the Third International Workshop on Paraphrasing (IWP2005).
  9. Style obfuscation by invariance. In Proceedings of the 27th International Conference on Computational Linguistics, pages 984–996, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
  10. Stylometric analysis of bloggers’ age and gender. Proceedings of the International AAAI Conference on Web and Social Media, 3(1):214–217.
  11. STEER: Unified style transfer with expert reinforcement. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7546–7562, Singapore. Association for Computational Linguistics.
  12. David I. Holmes. 1998. The evolution of stylometry in humanities scholarship. Literary and Linguistic Computing, 13:111–117.
  13. First quora dataset release: Question pairs. Journal of Machine Learning Research.
  14. Adversarial example generation with syntactically controlled paraphrase networks. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1875–1885, New Orleans, Louisiana. Association for Computational Linguistics.
  15. Gary Kacmarcik and Michael Gamon. 2006. Obfuscating document stylometry to preserve author anonymity. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 444–451, Sydney, Australia. Association for Computational Linguistics.
  16. The case for being average: A mediocrity approach to style masking and author obfuscation - (best of the labs track at clef-2017). In Conference and Labs of the Evaluation Forum.
  17. Author masking through translation. In Conference and Labs of the Evaluation Forum.
  18. Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023.
  19. The summary loop: Learning to write abstractive summaries without examples. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5135–5150, Online. Association for Computational Linguistics.
  20. Keep It Simple: Unsupervised Simplification of Multi-Paragraph Text. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6365–6378, Online. Association for Computational Linguistics.
  21. Alex Leavitt. 2015. "This is a Throwaway Account": Temporary Technical Identities and Perceptions of Anonymity in a Massive Online Community. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, CSCW ’15, pages 317–327, New York, NY, USA. Association for Computing Machinery.
  22. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension.
  23. Tatiana Litvinova. 2020. Stylometrics features under domain shift: Do they really “context-independent”? In Speech and Computer, pages 279–290, Cham. Springer International Publishing.
  24. Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8:726–742.
  25. A girl has no name: Automated authorship obfuscation using mutant-x. Proceedings on Privacy Enhancing Technologies, 2019:54 – 71.
  26. Exploring stylometric and emotion-based features for multilingual cross-domain hate speech detection. In Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 149–159, Online. Association for Computational Linguistics.
  27. Fatemehsadat Mireshghallah and Taylor Berg-Kirkpatrick. 2021. Style pooling: Automatic text style obfuscation for improved classification fairness. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2009–2022, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  28. Low-resource authorship style transfer with in-context learning. ArXiv, abs/2212.08986.
  29. Anselmo Peñas and Alvaro Rodrigo. 2011. A simple measure to assess non-response. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 1415–1424, Portland, Oregon, USA. Association for Computational Linguistics.
  30. Language models are unsupervised multitask learners.
  31. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67.
  32. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
  33. Self-critical sequence training for image captioning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 1179–1195. IEEE Computer Society.
  34. Learning Universal Authorship Representations. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 913–919, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  35. Effects of age and gender on blogging. In AAAI spring symposium: Computational approaches to analyzing weblogs, volume 6, pages 199–205.
  36. A4nt: Author attribute anonymity by adversarial training of neural machine translation. In USENIX Security Symposium.
  37. Efstathios Stamatatos. 2009. A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol., 60(3):538–556.
  38. Overview of the Authorship Verification Task at PAN 2022. In CLEF 2022 Labs and Workshops, Notebook Papers. CEUR-WS.org.
  39. Exploring document-level literary machine translation with parallel paragraphs from world literature. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9882–9902, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  40. Context collapse and anonymity among queer Reddit users. New Media & Society, 23(1):5–21.
  41. Attribution and obfuscation of neural text authorship: A data mining perspective. SIGKDD Explor., 25(1):1–18.
  42. A reinforced topic-aware convolutional sequence-to-sequence model for abstractive text summarization. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI’18, page 4453–4460. AAAI Press.
  43. Neural network acceptability judgments. Transactions of the Association for Computational Linguistics, 7:625–641.
  44. Ronald J. Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn., 8(3–4):229–256.
  45. BigScience Workshop. 2023. Bloom: A 176b-parameter open-access multilingual language model.
  46. Privacy-aware text rewriting. In Proceedings of the 12th International Conference on Natural Language Generation, pages 247–257, Tokyo, Japan. Association for Computational Linguistics.
  47. Large batch optimization for deep learning: Training BERT in 76 minutes. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
  48. Shiyue Zhang and Mohit Bansal. 2019. Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2495–2509, Hong Kong, China. Association for Computational Linguistics.
  49. PAWS: Paraphrase adversaries from word scrambling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1298–1308, Minneapolis, Minnesota. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Calvin Bao (4 papers)
  2. Marine Carpuat (56 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets