Code Generation Based Grading: Evaluating an Auto-grading Mechanism for "Explain-in-Plain-English" Questions (2311.14903v1)
Abstract: Comprehending and elucidating the purpose of code is often cited as being a key learning objective within introductory programming courses. To address this objective ``Explain-in-Plain-English'' questions, in which students are shown a segment of code and asked to provide an abstract description of the code's purpose, have been adopted. However, given EiPE questions require a natural language response, they often require manual grading which is time-consuming for course staff and delays feedback for students. With the advent of LLMs capable of generating code, responses to EiPE questions can be used to generate code segments, the correctness of which can then be easily verified using test cases. We refer to this approach as "Code Generation Based Grading" (CGBG) and in this paper we explore its agreement with human graders using EiPE responses from past exams in an introductory programming course taught in Python. Overall, we find that CGBG achieves moderate agreement with human graders with the primary area of disagreement being its leniency with respect to low-level and line-by-line descriptions of code.
- Sushmita Azad. 2020. Lessons learnt developing and deploying grading mechanisms for EiPE code-reading questions in CS1 classes. Ph. D. Dissertation.
- Strategies for deploying unreliable AI graders in high-transparency high-stakes exams. In Artificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, July 6–10, 2020, Proceedings, Part I 21. Springer, 16–28.
- Programming is hard-or at least it used to be: Educational opportunities and challenges of ai code generation. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 500–506.
- A validated scoring rubric for explain-in-plain-english questions. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education. 563–569.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
- Chatting and cheating: Ensuring academic integrity in the era of ChatGPT. Innovations in Education and Teaching International (2023), 1–12.
- Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language. arXiv preprint arXiv:2210.15157 (2022).
- The robots are coming: Exploring the implications of openai codex on introductory programming. In Australasian Computing Education Conference. 10–19.
- My AI Wants to Know if This Will Be on the Exam: Testing OpenAI’s Codex on CS2 Programming Exercises. In Proceedings of the 25th Australasian Computing Education Conference. 97–104.
- Autograding” Explain in Plain English” questions using NLP. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education. 1163–1169.
- How should we ‘Explain in plain English’? Voices from the Community. In Proceedings of the 17th ACM conference on international computing education research. 69–80.
- Reevaluating the relationship between explaining, tracing, and writing skills in CS1 in a replication study. Computer Science Education 32, 3 (2022), 355–383.
- Attitudes surrounding an imperfect AI autograder. In Proceedings of the 2021 CHI conference on human factors in computing systems. 1–15.
- Further evidence of a relationship between explaining, tracing and writing skills in introductory programming. Acm sigcse bulletin 41, 3 (2009), 161–165.
- Not seeing the forest for the trees: novice programmers and the SOLO taxonomy. ACM SIGCSE Bulletin 38, 3 (2006), 118–122.
- Relationships between reading, tracing and writing skills in introductory programming. In Proceedings of the fourth international workshop on computing education research. 101–112.
- On the educational impact of ChatGPT: Is Artificial Intelligence ready to obtain a university degree?. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1. 47–53.
- Contract Cheating–Dead or Reborn?. In 2023 32nd Annual Conference of the European Association for Education in Electrical and Information Engineering (EAEEIE). IEEE, 1–5.
- Mary L McHugh. 2012. Interrater reliability: the kappa statistic. Biochemia medica 22, 3 (2012), 276–282.
- Travis Ryan Pickell and Brian R Doak. 2023. Five Ideas for How Professors Can Deal with GPT-3… For Now. (2023).
- ” It’s Weird That it Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers. arXiv preprint arXiv:2304.02491 (2023).
- Evaluating the Performance of Code Generation Models for Solving Parsons Problems With Small Prompt Variations. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1. 299–305.
- A closer look at tracing, explaining and code writing skills in the novice programmer. In Proceedings of the fifth international workshop on Computing education research workshop. 117–128.
- Michel Wermelinger. 2023. Using GitHub Copilot to solve simple programming problems. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 172–178.
- An Australasian study of reading and comprehension skills in novice programmers, using the Bloom and SOLO taxonomies. (2006).
- A new era of plagiarism the danger of cheating using AI. In 2022 20th International Conference on Information Technology Based Higher Education and Training (ITHET). IEEE, 1–6.
- A theory of instruction for introductory programming skills. Computer Science Education 29, 2-3 (2019), 205–253.
- David H. Smith IV (29 papers)
- Craig Zilles (3 papers)