Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Generalization of Training-based ChatGPT Detection Methods (2310.01307v2)

Published 2 Oct 2023 in cs.CL, cs.AI, and cs.LG

Abstract: ChatGPT is one of the most popular LLMs which achieve amazing performance on various natural language tasks. Consequently, there is also an urgent need to detect the texts generated ChatGPT from human written. One of the extensively studied methods trains classification models to distinguish both. However, existing studies also demonstrate that the trained models may suffer from distribution shifts (during test), i.e., they are ineffective to predict the generated texts from unseen language tasks or topics. In this work, we aim to have a comprehensive investigation on these methods' generalization behaviors under distribution shift caused by a wide range of factors, including prompts, text lengths, topics, and language tasks. To achieve this goal, we first collect a new dataset with human and ChatGPT texts, and then we conduct extensive studies on the collected dataset. Our studies unveil insightful findings which provide guidance for developing future methodologies or data collection strategies for ChatGPT detection.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Han Xu (92 papers)
  2. Jie Ren (329 papers)
  3. Pengfei He (36 papers)
  4. Shenglai Zeng (19 papers)
  5. Yingqian Cui (14 papers)
  6. Amy Liu (3 papers)
  7. Hui Liu (481 papers)
  8. Jiliang Tang (204 papers)
Citations (9)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub