A comparison of Human, GPT-3.5, and GPT-4 Performance in a University-Level Coding Course
Abstract: This study evaluates the performance of ChatGPT variants, GPT-3.5 and GPT-4, both with and without prompt engineering, against solely student work and a mixed category containing both student and GPT-4 contributions in university-level physics coding assignments using the Python language. Comparing 50 student submissions to 50 AI-generated submissions across different categories, and marked blindly by three independent markers, we amassed $n = 300$ data points. Students averaged 91.9% (SE:0.4), surpassing the highest performing AI submission category, GPT-4 with prompt engineering, which scored 81.1% (SE:0.8) - a statistically significant difference (p = $2.482 \times 10{-10}$). Prompt engineering significantly improved scores for both GPT-4 (p = $1.661 \times 10{-4}$) and GPT-3.5 (p = $4.967 \times 10{-9}$). Additionally, the blinded markers were tasked with guessing the authorship of the submissions on a four-point Likert scale from Definitely AI' toDefinitely Human'. They accurately identified the authorship, with 92.1% of the work categorized as 'Definitely Human' being human-authored. Simplifying this to a binary AI' orHuman' categorization resulted in an average accuracy rate of 85.3%. These findings suggest that while AI-generated work closely approaches the quality of university students' work, it often remains detectable by human evaluators.
- “Evaluating large language models trained on code” In arXiv preprint arXiv:2107.03374, 2021
- “Program synthesis with large language models” In arXiv preprint arXiv:2108.07732, 2021
- “Is ChatGPT the Ultimate Programming Assistant–How far is it?” In arXiv preprint arXiv:2304.11938, 2023
- “The impact of AI in physics education: a comprehensive review from GCSE to university levels” In Physics Education 59.2 IOP Publishing, 2024, pp. 025010 DOI: 10.1088/1361-6552/ad1fa2
- “Evaluating AI and Human Authorship Quality in Academic Writing through Physics Essays” In arXiv preprint arXiv:2403.05458, 2024 arXiv: https://arxiv.org/abs/2403.05458
- Colin G West “AI and the FCI: Can ChatGPT project an understanding of introductory physics?” In arXiv preprint arXiv:2303.01067, 2023
- Gerd Kortemeyer “Could an artificial-intelligence agent pass an introductory physics course?” In Physical Review Physics Education Research 19.1 APS, 2023, pp. 010132
- “Performance of ChatGPT on the test of understanding graphs in kinematics” In Physical Review Physics Education Research 20.1 APS, 2024, pp. 010109
- “How understanding large language models can inform the use of ChatGPT in physics education” In European Journal of Physics 45.2 IOP Publishing, 2024, pp. 025701
- “More Than Meets the AI: Evaluating the performance of GPT-4 on Computer Graphics assessment questions” In Proceedings of the 26th Australasian Computing Education Conference, 2024, pp. 182–191
- OpenAI “Best practices for prompt engineering with OpenAI API” https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api, 2023
- “Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical Final Examination” In Scientific Reports 13.1 Nature Publishing Group UK London, 2023, pp. 20512
- “Evaluating GPT-3.5 and GPT-4 models on Brazilian university admission exams” In arXiv preprint arXiv:2303.17003, 2023
- “Taking Flight with Copilot: Early insights and opportunities of AI-powered pair-programming tools” In Queue 20.6 ACM New York, NY, USA, 2022, pp. 35–57
- “GitHub Copilot AI pair programmer: Asset or Liability?” In Journal of Systems and Software 203, 2023, pp. 111734 DOI: https://doi.org/10.1016/j.jss.2023.111734
- “Is AI the better programming partner? Human-Human Pair Programming vs. Human-AI pAIr Programming” In arXiv preprint arXiv:2306.05153, 2023
- “Navigating the jagged technological frontier: Field experimental evidence of the effects of AI on knowledge worker productivity and quality” In Harvard Business School Technology & Operations Mgt. Unit Working Paper, 2023
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.