LLMs Still Can't Avoid Instanceof: An Investigation Into GPT-3.5, GPT-4 and Bard's Capacity to Handle Object-Oriented Programming Assignments (2403.06254v1)
Abstract: LLMs have emerged as promising tools to assist students while solving programming assignments. However, object-oriented programming (OOP), with its inherent complexity involving the identification of entities, relationships, and responsibilities, is not yet mastered by these tools. Contrary to introductory programming exercises, there exists a research gap with regard to the behavior of LLMs in OOP contexts. In this study, we experimented with three prominent LLMs - GPT-3.5, GPT-4, and Bard - to solve real-world OOP exercises used in educational settings, subsequently validating their solutions using an Automatic Assessment Tool (AAT). The findings revealed that while the models frequently achieved mostly working solutions to the exercises, they often overlooked the best practices of OOP. GPT-4 stood out as the most proficient, followed by GPT-3.5, with Bard trailing last. We advocate for a renewed emphasis on code quality when employing these models and explore the potential of pairing LLMs with AATs in pedagogical settings. In conclusion, while GPT-4 showcases promise, the deployment of these models in OOP education still mandates supervision.
- Anonymous. 2023. How GPT-3.5, GPT-4 and Bard handled an Object Oriented Programming Assignment - Full Interaction Logs. https://doi.org/10.5281/zenodo.8246165 This is the anonymized version to support peer review.
- Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information fusion 58 (2020), 82–115.
- Language Models are Few-Shot Learners. Advances in neural information processing systems 33 (2020), 1877–1901.
- Sparks of Artificial General Intelligence: Early experiments with GPT-4. arXiv preprint arXiv:2303.12712 (2023).
- Low-code LLM: Visual Programming over LLMs. arXiv preprint arXiv:2304.08103 (2023).
- Evaluating Large Language Models Trained on Code. arXiv preprint arXiv:2107.03374 (2021).
- Bruno Pereira Cipriano and Pedro Alves. 2023a. GPT-3 vs Object Oriented Programming Assignments: An Experience Report. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1 (Turku, Finland) (ITiCSE 2023). Association for Computing Machinery, New York, NY, USA, 61–67. https://doi.org/10.1145/3587102.3588814
- Bruno Pereira Cipriano and Pedro Alves. 2023b. How GPT-3.5, GPT-4 and Bard handled an Object Oriented Programming Assignment - Full Interaction Logs. https://doi.org/10.5281/zenodo.8246165
- Drop Project: An automatic assessment tool for programming assignments. SoftwareX 18 (2022), 101079.
- Marian Daun and Jennifer Brings. 2023. How ChatGPT Will Change Software Engineering Education. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1. 110–116.
- Promptly: Using Prompt Problems to Teach Learners How to Effectively Utilize AI Code Generators. arXiv preprint arXiv:2307.16364 (2023).
- A Preliminary Analysis on the Code Generation Capabilities of GPT-3.5 and Bard AI Models for Java Functions. arXiv preprint arXiv:2305.09402 (2023).
- The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In Proceedings of the 24th Australasian Computing Education Conference. 10–19.
- My AI Wants to Know if This Will Be on the Exam: Testing OpenAI’s Codex on CS2 Programming Exercises. In Proceedings of the 25th Australasian Computing Education Conference. 97–104.
- Roman Ivanov et al. 2023. Checkstyle. https://checkstyle.org/. [Online; last accessed 20-January-2023].
- A Systematic Mapping Study of Code Quality in Education. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1. 5–11.
- Sam Lau and Philip J Guo. 2023. From” Ban It Till We Understand It” to” Resistance is Futile”: How University Programming Instructors Plan to Adapt as More Students Use AI Code Generation and Explanation Tools such as ChatGPT and GitHub Copilot. (2023).
- Comparing Code Explanations Created by Students and Large Language Models. arXiv preprint arXiv:2304.03938 (2023).
- CodeHelp: Using Large Language Models with Guardrails for Scalable Support in Programming Classes. arXiv preprint arXiv:2308.06921 (2023).
- Engineering education—Is problem-based or project-based learning the answer. Australasian journal of engineering education 3, 2 (2003), 2–16.
- Elena N Naumova. 2023. A mistake-find exercise: a teacher’s tool to engage with information innovations, ChatGPT, and their analogs. Journal of Public Health Policy 44, 2 (2023), 173–178.
- OpenAI. 2023. GPT-4 Technical Report. https://doi.org/10.48550/arXiv.2303.08774 arXiv:2303.08774 [cs].
- ChatGPT, Can You Generate Solutions for my Coding Exercises? An Evaluation on its Effectiveness in an undergraduate Java Programming Course. arXiv preprint arXiv:2305.13680 (2023).
- Dale Parsons and Patricia Haden. 2006. Parson’s Programming Puzzles: A Fun and Effective Learning Tool for First Programming Courses. In Proceedings of the 8th Australasian Conference on Computing Education-Volume 52. 157–163.
- Evaluating the Performance of Code Generation Models for Solving Parsons Problems With Small Prompt Variations. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1. 299–305.
- Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses. arXiv preprint arXiv:2306.10073 (2023).
- Can Generative Pre-trained Transformers (GPT) Pass Assessments in Higher Education Programming Courses? arXiv preprint arXiv:2303.09325 (2023).
- Myeong-Hee Shin. 2018. Effects of Project-Based Learning on Students’ Motivation and Self-Efficacy. English Teaching 73, 1 (2018), 95–114.
- Pichai Sundar. 2023. An important next step on our AI journey. https://blog.google/technology/ai/bard-google-ai-search-updates/. [Online; last accessed 10-August-2023].
- Lamda: Language Models for Dialog Applications. arXiv preprint arXiv:2201.08239 (2022).
- Peter Wegner. 1990. Concepts and Paradigms of Object-Oriented Programming. ACM Sigplan Oops Messenger 1, 1 (1990), 7–87.
- Michel Wermelinger. 2023. Using GitHub Copilot to Solve Simple Programming Problems. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 172–178.
- A Systematic Evaluation of Large Language Models of Code. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming. 1–10.
- Bruno Pereira Cipriano (7 papers)
- Pedro Alves (9 papers)