Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Case for Being Average: A Mediocrity Approach to Style Masking and Author Obfuscation (1707.03736v2)

Published 12 Jul 2017 in cs.CL

Abstract: Users posting online expect to remain anonymous unless they have logged in, which is often needed for them to be able to discuss freely on various topics. Preserving the anonymity of a text's writer can be also important in some other contexts, e.g., in the case of witness protection or anonymity programs. However, each person has his/her own style of writing, which can be analyzed using stylometry, and as a result, the true identity of the author of a piece of text can be revealed even if s/he has tried to hide it. Thus, it could be helpful to design automatic tools that can help a person obfuscate his/her identity when writing text. In particular, here we propose an approach that changes the text, so that it is pushed towards average values for some general stylometric characteristics, thus making the use of these characteristics less discriminative. The approach consists of three main steps: first, we calculate the values for some popular stylometric metrics that can indicate authorship; then we apply various transformations to the text, so that these metrics are adjusted towards the average level, while preserving the semantics and the soundness of the text; and finally, we add random noise. This approach turned out to be very efficient, and yielded the best performance on the Author Obfuscation task at the PAN-2016 competition.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Georgi Karadjov (2 papers)
  2. Tsvetomila Mihaylova (11 papers)
  3. Yasen Kiprov (5 papers)
  4. Georgi Georgiev (28 papers)
  5. Ivan Koychev (33 papers)
  6. Preslav Nakov (253 papers)
Citations (32)

Summary

We haven't generated a summary for this paper yet.