Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Zero-Shot Detection of Machine-Generated Codes (2310.05103v1)

Published 8 Oct 2023 in cs.CL, cs.AI, cs.CR, and cs.LG

Abstract: This work proposes a training-free approach for the detection of LLMs-generated codes, mitigating the risks associated with their indiscriminate usage. To the best of our knowledge, our research is the first to investigate zero-shot detection techniques applied to code generated by advanced black-box LLMs like ChatGPT. Firstly, we find that existing training-based or zero-shot text detectors are ineffective in detecting code, likely due to the unique statistical properties found in code structures. We then modify the previous zero-shot text detection method, DetectGPT (Mitchell et al., 2023) by utilizing a surrogate white-box model to estimate the probability of the rightmost tokens, allowing us to identify code snippets generated by LLMs. Through extensive experiments conducted on the python codes of the CodeContest and APPS dataset, our approach demonstrates its effectiveness by achieving state-of-the-art detection results on text-davinci-003, GPT-3.5, and GPT-4 models. Moreover, our method exhibits robustness against revision attacks and generalizes well to Java codes. We also find that the smaller code LLM like PolyCoder-160M performs as a universal code detector, outperforming the billion-scale counterpart. The codes will be available at https://github.com/ Xianjun-Yang/Code_detection.git

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xianjun Yang (37 papers)
  2. Kexun Zhang (21 papers)
  3. Haifeng Chen (99 papers)
  4. Linda Petzold (45 papers)
  5. William Yang Wang (254 papers)
  6. Wei Cheng (175 papers)
Citations (10)
Github Logo Streamline Icon: https://streamlinehq.com