ChatGPT Code Detection: Techniques for Uncovering the Source of Code (2405.15512v2)
Abstract: In recent times, LLMs have made significant strides in generating computer code, blurring the lines between code created by humans and code produced by AI. As these technologies evolve rapidly, it is crucial to explore how they influence code generation, especially given the risk of misuse in areas like higher education. This paper explores this issue by using advanced classification techniques to differentiate between code written by humans and that generated by ChatGPT, a type of LLM. We employ a new approach that combines powerful embedding features (black-box) with supervised learning algorithms - including Deep Neural Networks, Random Forests, and Extreme Gradient Boosting - to achieve this differentiation with an impressive accuracy of 98%. For the successful combinations, we also examine their model calibration, showing that some of the models are extremely well calibrated. Additionally, we present white-box features and an interpretable Bayes classifier to elucidate critical differences between the code sources, enhancing the explainability and transparency of our approach. Both approaches work well but provide at most 85-88% accuracy. We also show that untrained humans solve the same task not better than random guessing. This study is crucial in understanding and mitigating the potential risks associated with using AI in code generation, particularly in the context of higher education, software development, and competitive programming.
- A Comprehensive Study of ChatGPT: Advancements, Limitations, and Ethical Considerations in Natural Language Processing and Cybersecurity. Information 2023, 14, 462. https://doi.org/10.3390/info14080462.
- Productivity assessment of neural code completion. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming, New York, NY, USA, 2022; MAPS 2022, p. 21–29. https://doi.org/10.1145/3520312.3534864.
- From Text to MITRE Techniques: Exploring the Malicious Use of Large Language Models for Generating Cyber Attack Payloads. arXiv preprint 2023. https://doi.org/10.48550/arXiv.2305.15336.
- Pause Giant AI Experiments: An Open Letter. https://futureoflife.org/open-letter/pause-giant-ai-experiments/, 2023. Accessed: 2023-08-18.
- Policymaking in the Pause. https://futureoflife.org/wp-content/uploads/2023/04/FLI_Policymaking_In_The_Pause.pdf, 2023. Accessed: 2023-08-18.
- Ethical and social risks of harm from Language Models. arXiv preprint 2021. https://doi.org/10.48550/arXiv.2112.04359.
- Ethical Considerations and Policy Implications for Large Language Models: Guiding Responsible Development and Deployment. arXiv preprint 2023. https://doi.org/10.48550/arXiv.2308.02678.
- ChatGPT and a new academic reality: Artificial Intelligence-written research papers and the ethics of the large language models in scholarly publishing. Journal of the Association for Information Science and Technology 2023, 74, 570–581. https://doi.org/10.1002/asi.24750.
- DetectGPT: zero-shot machine-generated text detection using probability curvature. In Proceedings of the 40th International Conference on Machine Learning, 2023, ICML’23.
- GLTR: Statistical Detection and Visualization of Generated Text. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations; Costa-jussà, M.R.; Alfonseca, E., Eds., Florence, Italy, 2019; pp. 111–116. https://doi.org/10.18653/v1/P19-3019.
- Distinguishing Human-Written and ChatGPT-Generated Text Using Machine Learning. In Proceedings of the 2023 Systems and Information Engineering Design Symposium (SIEDS). IEEE, 2023, pp. 154–158. https://doi.org/10.1109/SIEDS58326.2023.10137767.
- Towards Possibilities and Impossibilities of AI-generated Text Detection: A Survey. arXiv preprint 2023. https://doi.org/10.48550/arXiv.2310.15264.
- Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions. In Proceedings of the 2022 IEEE Symposium on Security and Privacy (SP), 2022, pp. 754–768. https://doi.org/10.1109/SP46214.2022.9833571.
- Exploring the Impact of Code Style in Identifying Good Programmers. In Proceedings of the 10th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2022), 2023, pp. 18–24.
- Release Strategies and the Social Impacts of Language Models. arXiv preprint 2019. https://doi.org/10.48550/arXiv.1908.09203.
- Distinguishing Human Generated Text From ChatGPT Generated Text Using Machine Learning. arXiv preprint 2023. https://doi.org/10.48550/arXiv.2306.01761.
- Detecting ChatGPT-Generated Code Submissions in a CS1 Course Using Machine Learning Models. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1, New York, NY, USA, 2024; SIGCSE 2024, p. 526–532. https://doi.org/10.1145/3626252.3630826.
- Zero-shot detection of machine-generated codes. arXiv preprint 2023. https://doi.org/10.48550/arXiv.2310.05103.
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 2020, 21, 1–67.
- InCoder: A Generative Model for Code Infilling and Synthesis. In Proceedings of the Int. Conf. on Learning Repr. (ICLR), 2023.
- Language Models are Few-Shot Learners. In Proceedings of the Advances in Neural Information Processing Systems; Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.; Lin, H., Eds. Curran Associates, Inc., 2020, Vol. 33, pp. 1877–1901.
- GPT-4 Technical Report. arXiv preprint 2023. https://doi.org/10.48550/arXiv.2303.08774.
- GPTZero: Towards detection of AI-generated text using zero-shot and supervised methods. https://gptzero.me, 2023. Accessed: 2024-02-27.
- How close is ChatGPT to human experts? Comparison corpus, evaluation, and detection. arXiv preprint 2023. https://doi.org/10.48550/arXiv.2301.07597.
- Evaluating AIGC Detectors on Code Content. arXiv preprint 2023. https://doi.org/10.48550/arXiv.2304.05193.
- Assessing AI Detectors in Identifying AI-Generated Code: Implications for Education. arXiv preprint 2024. https://doi.org/10.48550/arXiv.2401.03676.
- Term-weighting approaches in automatic text retrieval. Information Processing & Management 1988, 24, 513–523. https://doi.org/10.1016/0306-4573(88)90021-0.
- code2vec: learning distributed representations of code. Proc. ACM Program. Lang. 2019, 3. https://doi.org/10.1145/3290353.
- A Novel Neural Source Code Representation Based on Abstract Syntax Tree. In Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019, pp. 783–794. https://doi.org/10.1109/ICSE.2019.00086.
- Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of the Advances in Neural Information Processing Systems; Burges, C.; Bottou, L.; Welling, M.; Ghahramani, Z.; Weinberger, K., Eds. Curran Associates, Inc., 2013, Vol. 26.
- Document Modeling with Gated Recurrent Neural Network for Sentiment Classification. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing; Màrquez, L.; Callison-Burch, C.; Su, J., Eds., Lisbon, Portugal, 2015; pp. 1422–1432. https://doi.org/10.18653/v1/D15-1167.
- Discriminating Human-authored from ChatGPT-Generated Code Via Discernable Feature Analysis. In Proceedings of the 2023 IEEE 34th International Symposium on Software Reliability Engineering Workshops (ISSREW), Los Alamitos, CA, USA, 2023; pp. 120–127. https://doi.org/10.1109/ISSREW60843.2023.00059.
- OpenAI. Models. https://platform.openai.com/docs/models/overview, 2023. Accessed: 2023-07-29.
- Sequence to Sequence Learning with Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems; Ghahramani, Z.; Welling, M.; Cortes, C.; Lawrence, N.; Weinberger, K., Eds. Curran Associates, Inc., 2014, Vol. 27, p. 3104–3112.
- Attention is All you Need. In Proceedings of the Advances in Neural Information Processing Systems; Guyon, I.; Luxburg, U.V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; Garnett, R., Eds. Curran Associates, Inc., 2017, Vol. 30, pp. 5998–6008.
- Recurrent Neural Networks: Design and Applications, 1st ed.; CRC Press, Inc.: USA, 1999.
- Long Short-Term Memory. Neural Computation 1997, 9, 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.
- Robertson, S. Understanding inverse document frequency: on theoretical arguments for IDF. Journal of Documentation 2004, 60, 503–520. https://doi.org/10.1108/00220410410560582.
- Efficient Estimation of Word Representations in Vector Space. arXiv preprint 2013. https://doi.org/10.48550/arXiv.1301.3781.
- Text and Code Embeddings by Contrastive Pre-Training. arXiv preprint 2022. https://doi.org/10.48550/arXiv.2201.10005.
- A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning (ICML), 2020, ICML’20.
- Entailment as Few-Shot Learner. CoRR 2021, abs/2104.14690, [2104.14690].
- Applied logistic regression; Vol. 398, Wiley Series in Probability and Statistics, John Wiley & Sons, 2013. https://doi.org/10.1002/9781118548387.
- Classification and Regression Trees; Chapman and Hall/CRC, 1984. https://doi.org/10.1201/9781315139470.
- Oblique predictive clustering trees. Knowledge-Based Systems 2021, 227, 107228. https://doi.org/10.1016/j.knosys.2021.107228.
- Breiman, L. Random Forests. Machine Learning 2001, 45, 5–32. https://doi.org/10.1023/A:1010933404324.
- XGBoost: A Scalable Tree Boosting System. In Proceedings of the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2016; KDD ’16, p. 785–794. https://doi.org/10.1145/2939672.2939785.
- Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics 2001, 29, 1189–1232.
- Hopfield, J.J. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Science 1982, 79, 2554–2558.
- Deep neural networks in a mathematical framework; SpringerBriefs in Computer Science, Springer: Cham, 2018. https://doi.org/10.1007/978-3-319-75304-1.
- Evaluating Large Language Models Trained on Code. arXiv preprint 2021. https://doi.org/10.48550/arXiv.2107.03374.
- Measuring Coding Challenge Competence With APPS. In Proceedings of the Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
- CodeChef. CodeChef. https://www.codechef.com, 2023. Accessed: 2023-05-17.
- Competition-level code generation with AlphaCode. Science 2022, 378, 1092–1097. https://doi.org/10.1126/science.abq1158.
- HackerEarth. HackerEarth. https://www.hackerearth.com, 2023. Accessed: 2023-09-11.
- Program Synthesis with Large Language Models. arXiv preprint 2021. https://doi.org/10.48550/arXiv.2108.07732.
- Trajkovski, M. MTrajK. https://github.com/MTrajK/coding-problems, 2023. Accessed: 2023-09-11.
- Black. The Uncompromising Code Formatter. https://github.com/psf/black, 2023. Accessed: 2023-07-29.
- Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 2011, 12, 2825–2830.
- TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems, 2015. Software available from tensorflow.org.
- Gensim–python framework for vector space modelling. NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic 2011, 3.
- Increasing the Reliability of Reliability Diagrams. Weather and Forecasting 2007, 22, 651–661.
- Graphical Assessment of Internal and External Calibration of Logistic Regression Models by using LOESS Smoothers. Statistics in Medicine 2014, 33, 517–535. https://doi.org/10.1002/sim.5941.
- Hattori, H. AutoPEP8. https://github.com/hhatto/autopep8, 2023. Accessed: 2023-09-27.
- CodeT5+: Open Code Large Language Models for Code Understanding and Generation. In Proceedings of the Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing; Bouamor, H.; Pino, J.; Bali, K., Eds., Singapore, 2023; pp. 1069–1088. https://doi.org/10.18653/v1/2023.emnlp-main.68.
- Marc Oedingen (1 paper)
- Raphael C. Engelhardt (4 papers)
- Robin Denz (6 papers)
- Maximilian Hammer (1 paper)
- Wolfgang Konen (13 papers)