Assessing AI Detectors in Identifying AI-Generated Code: Implications for Education (2401.03676v1)
Abstract: Educators are increasingly concerned about the usage of LLMs such as ChatGPT in programming education, particularly regarding the potential exploitation of imperfections in Artificial Intelligence Generated Content (AIGC) Detectors for academic misconduct. In this paper, we present an empirical study where the LLM is examined for its attempts to bypass detection by AIGC Detectors. This is achieved by generating code in response to a given question using different variants. We collected a dataset comprising 5,069 samples, with each sample consisting of a textual description of a coding problem and its corresponding human-written Python solution codes. These samples were obtained from various sources, including 80 from Quescol, 3,264 from Kaggle, and 1,725 from LeetCode. From the dataset, we created 13 sets of code problem variant prompts, which were used to instruct ChatGPT to generate the outputs. Subsequently, we assessed the performance of five AIGC detectors. Our results demonstrate that existing AIGC Detectors perform poorly in distinguishing between human-written code and AI-generated code.
- [n. d.]. GPTzero. https://gptzero.me/
- [n. d.]. Sapling. https://sapling.ai/ai-content-detector [Online]. Available.
- Adnan Al Medawer. [n. d.]. Textual Analysis and Detection of AI-Generated Academic Texts. ([n. d.]).
- Fawad Ali. 2023. GPT-1 to GPT-4: Each of OpenAI’s GPT Models Explained and Compared. (11 April 2023). https://www.makeuseof.com/gpt-models-explained-and-compared/
- David Baidoo-Anu and Leticia Owusu Ansah. 2023. Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Journal of AI 7, 1 (2023), 52–62.
- Aras Bozkurt. 2023. Generative artificial intelligence (AI) powered conversational educational agents: The inevitable paradigm shift. Asian Journal of Distance Education 18, 1 (2023).
- BurhanUlTayyab. 2023. DetectGPT. https://github.com/BurhanUlTayyab/DetectGPT.
- Ralph Cajipe. 2023. chatgpt-prompt-engineering. https://github.com/ralphcajipe/chatgpt-prompt-engineering/blob/main/1-guidelines.ipynb.
- Christoph C. Cemper. 2023. Ai cheats - how to trick Ai Content Detectors. https://www.linkresearchtools.com/blog/ai-content-detector-cheats/
- Chaka Chaka. 2023. Detecting AI content in responses generated by ChatGPT, YouChat, and Chatsonic: The case of five AI content detection tools. Journal of Applied Learning and Teaching 6, 2 (2023).
- Cecilia Ka Yuk Chan. 2023. A comprehensive AI policy education framework for university teaching and learning. International Journal of Educational Technology in Higher Education 20, 1 (2023), 1–25.
- Cecilia Ka Yuk Chan and Katherine KW Lee. 2023. The AI generation gap: Are Gen Z students more interested in adopting generative AI such as ChatGPT in teaching and learning than their Gen X and Millennial Generation teachers? arXiv preprint arXiv:2305.02878 (2023).
- ChatGPT’s One-year Anniversary: Are Open-Source Large Language Models Catching up? arXiv preprint arXiv:2311.16989 (2023).
- Frances Chumney. 2018. PAIRED SAMPLES t & WILCOXON SIGNED RANKS TESTS. Retrieved January 24 (2018), 2022.
- Damian Okaibedi Eke. 2023. ChatGPT and the rise of generative AI: Threat to academic integrity? Journal of Responsible Technology 13 (2023), 100060.
- Tom Farrelly and Nick Baker. 2023. Generative artificial intelligence: Implications and considerations for higher education practice. Education Sciences 13, 11 (2023), 1109.
- GLTR: Statistical Detection and Visualization of Generated Text. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Florence, Italy, 111–116. https://doi.org/10.18653/v1/P19-3019
- Simone Grassini. 2023. Shaping the future of education: exploring the potential and consequences of AI and ChatGPT in educational settings. Education Sciences 13, 7 (2023), 692.
- How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection. arXiv preprint arxiv:2301.07597 (2023).
- Large Language Models for Software Engineering: A Systematic Literature Review. arXiv:2308.10620 [cs.SE]
- Generative AI and Teachers’ Perspectives on Its Implementation in Education. Journal of Interactive Learning Research 34, 2 (2023), 313–338.
- Tetyana Tanya Krupiy. 2020. A vulnerability analysis: Theorising the impact of artificial intelligence decision-making processes on individuals, society and human diversity from a social justice perspective. Computer law & security review 38 (2020), 105429.
- Foundations of data imbalance and solutions for a data democracy. In Data democracy. Elsevier, 83–106.
- Abstraction in Computer Science Education: An Overview. Informatics in Education 20, 4 (2022), 615–639.
- DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature. In Proceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 24950–24962. https://proceedings.mlr.press/v202/mitchell23a.html
- A comprehensive overview of large language models. arXiv preprint arXiv:2307.06435 (2023).
- Jahna Otterbacher. 2023. Why technical solutions for detecting AI-generated content in research and education are insufficient. Patterns 4, 7 (2023).
- Catching a Unicorn with GLTR: A tool to detect automatically generated text. Collaboration of MIT-IBM Watson AI lab and HarvardNLP. http://gltr.io/
- Jiahong Su and Weipeng Yang. 2023. Unlocking the power of ChatGPT: A framework for applying generative AI in education. ECNU Review of Education (2023), 20965311231168423.
- Teo Susnjak. 2022. ChatGPT: The end of online exam integrity? arXiv preprint arXiv:2212.09292 (2022).
- Assessment in the age of artificial intelligence. Computers and Education: Artificial Intelligence 3 (2022), 100075.
- Chip Thien. 2023. gpt-2-output-dataset. https://github.com/MacroChip/gpt-2-output-dataset
- Levent Uzun. 2023. ChatGPT and academic integrity concerns: Detecting artificial intelligence generated content. Language Education and Technology 3, 1 (2023).
- Ž Vujović et al. 2021. Classification model evaluation metrics. International Journal of Advanced Computer Science and Applications 12, 6 (2021), 599–606.
- Evaluating AIGC Detectors on Code Content. arXiv preprint arXiv:2304.05193 (2023).
- www.kaggle.com. 2023a. Leetcode Solutions and Content KPIs. https://www.kaggle.com/datasets/jacobhds/leetcode-solutions-and-content-kpis Last accessed on May 16, 2023.
- www.kaggle.com. 2023b. Natural Language to Python Code. https://www.kaggle.com/datasets/linkanjarad/coding-problems-and-solution-python-code Last accessed on May 16, 2023.
- www.quescol.com. 2023a. Python Coding Question: 90+ Python Interview Coding Questions. https://quescol.com/interview-preparations/python-coding-question#google_vignette Last accessed on May 16, 2023.
- www.quescol.com. 2023b. Quescol - A Platform That Provides Previous Year Questions And Answers. https://quescol.com/ Last accessed on Dec 23, 2023.
- www.wikipedia.org. 2023. Kaggle. https://en.wikipedia.org/wiki/Kaggle Last accessed on Dec 23, 2023.
- Franco Zambonelli and H Van Dyke Parunak. 2002. Signs of a revolution in computer science and software engineering. In International Workshop on Engineering Societies in the Agents World. Springer, 13–28.
- Wei Hung Pan (1 paper)
- Ming Jie Chok (1 paper)
- Jonathan Leong Shan Wong (1 paper)
- Yung Xin Shin (1 paper)
- Yeong Shian Poon (1 paper)
- Zhou Yang (82 papers)
- Chun Yong Chong (18 papers)
- David Lo (229 papers)
- Mei Kuan Lim (13 papers)