Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Are You Robert or RoBERTa? Deceiving Online Authorship Attribution Models Using Neural Text Generators (2203.09813v1)

Published 18 Mar 2022 in cs.CL, cs.AI, and cs.LG

Abstract: Recently, there has been a rise in the development of powerful pre-trained natural LLMs, including GPT-2, Grover, and XLM. These models have shown state-of-the-art capabilities towards a variety of different NLP tasks, including question answering, content summarisation, and text generation. Alongside this, there have been many studies focused on online authorship attribution (AA). That is, the use of models to identify the authors of online texts. Given the power of natural LLMs in generating convincing texts, this paper examines the degree to which these LLMs can generate texts capable of deceiving online AA models. Experimenting with both blog and Twitter data, we utilise GPT-2 LLMs to generate texts using the existing posts of online users. We then examine whether these AI-based text generators are capable of mimicking authorial style to such a degree that they can deceive typical AA models. From this, we find that current AI-based text generators are able to successfully mimic authorship, showing capabilities towards this on both datasets. Our findings, in turn, highlight the current capacity of powerful natural LLMs to generate original online posts capable of mimicking authorial style sufficiently to deceive popular AA methods; a key finding given the proposed role of AA in real world applications such as spam-detection and forensic investigation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Keenan Jones (6 papers)
  2. Jason R. C. Nurse (60 papers)
  3. Shujun Li (67 papers)
Citations (17)