Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text (2405.12689v2)

Published 21 May 2024 in cs.CL and cs.AI

Abstract: AI-generated text detection has attracted increasing attention as powerful LLMs approach human-level generation. Limited work is devoted to detecting (partially) AI-paraphrased texts. However, AI paraphrasing is commonly employed in various application scenarios for text refinement and diversity. To this end, we propose a novel detection framework, paraphrased text span detection (PTD), aiming to identify paraphrased text spans within a text. Different from text-level detection, PTD takes in the full text and assigns each of the sentences with a score indicating the paraphrasing degree. We construct a dedicated dataset, PASTED, for paraphrased text span detection. Both in-distribution and out-of-distribution results demonstrate the effectiveness of PTD models in identifying AI-paraphrased text spans. Statistical and model analysis explains the crucial role of the surrounding context of the paraphrased text spans. Extensive experiments show that PTD models can generalize to versatile paraphrasing prompts and multiple paraphrased text spans. We release our resources at https://github.com/Linzwcs/PASTED.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yafu Li (26 papers)
  2. Zhilin Wang (38 papers)
  3. Leyang Cui (50 papers)
  4. Wei Bi (62 papers)
  5. Shuming Shi (126 papers)
  6. Yue Zhang (620 papers)
Citations (1)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub