Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Is this Snippet Written by ChatGPT? An Empirical Study with a CodeBERT-Based Classifier (2307.09381v2)

Published 18 Jul 2023 in cs.SE

Abstract: Since its launch in November 2022, ChatGPT has gained popularity among users, especially programmers who use it as a tool to solve development problems. However, while offering a practical solution to programming problems, ChatGPT should be mainly used as a supporting tool (e.g., in software education) rather than as a replacement for the human being. Thus, detecting automatically generated source code by ChatGPT is necessary, and tools for identifying AI-generated content may need to be adapted to work effectively with source code. This paper presents an empirical study to investigate the feasibility of automated identification of AI-generated code snippets, and the factors that influence this ability. To this end, we propose a novel approach called GPTSniffer, which builds on top of CodeBERT to detect source code written by AI. The results show that GPTSniffer can accurately classify whether code is human-written or AI-generated, and outperforms two baselines, GPTZero and OpenAI Text Classifier. Also, the study shows how similar training data or a classification context with paired snippets helps to boost classification performances.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Phuong T. Nguyen (22 papers)
  2. Juri Di Rocco (18 papers)
  3. Claudio Di Sipio (21 papers)
  4. Riccardo Rubei (10 papers)
  5. Davide Di Ruscio (30 papers)
  6. Massimiliano Di Penta (31 papers)
Citations (4)