Automatic Generation of Programming Exercises and Code Explanations using Large Language Models (2206.11861v2)

Published 3 Jun 2022 in cs.SE, cs.AI, and cs.CL

Abstract: This article explores the natural language generation capabilities of LLMs with application to the production of two types of learning resources common in programming courses. Using OpenAI Codex as the LLM, we create programming exercises (including sample solutions and test cases) and code explanations, assessing these qualitatively and quantitatively. Our results suggest that the majority of the automatically generated content is both novel and sensible, and in some cases ready to use as is. When creating exercises we find that it is remarkably easy to influence both the programming concepts and the contextual themes they contain, simply by supplying keywords as input to the model. Our analysis suggests that there is significant value in massive generative machine learning models as a tool for instructors, although there remains a need for some oversight to ensure the quality of the generated content before it is delivered to students. We further discuss the implications of OpenAI Codex and similar tools for introductory programming education and highlight future research streams that have the potential to improve the quality of the educational experience for both teachers and students alike.

Citations (274)

View on Semantic Scholar

Summary

The paper shows that 75% of exercises are sensible and 81.8% are novel, although only 30.9% pass test cases.
The paper employs targeted prompts with OpenAI Codex to generate both programming exercises and step-by-step code explanations.
The paper indicates that integrating automated content can reduce educators' workload and enable tailored learning experiences.

Analyzing Automatic Generation of Programming Exercises and Code Explanations with LLMs

The paper "Automatic Generation of Programming Exercises and Code Explanations with LLMs" investigates the utilization of OpenAI Codex, a fine-tuned derivative of the GPT-3 model, for generating programming exercises and code explanations, components central to introductory programming curricula. This automated generation is positioned as a potential asset for educators, assisting in addressing the challenges of content creation and feedback provision.

Methodology and Approach

The authors employ OpenAI Codex to create programming exercises by providing the model with specific prompts, including keywords that influence contextual themes and programming concepts. Additionally, they examine its capacity to produce step-by-step explanations of code. This bifurcated approach focuses on both problem generation and educational scaffolding. The research evaluates outputs based on sensibility, novelty, and applicability while checking the correctness of generated exercises through sample solutions and test cases.

Major Findings

The findings indicate substantial promise in using LLMs for educational content creation. A significant portion—75%—of the automatically generated exercises were deemed sensible, and 81.8% were novel, as assessed by searches for pre-existing content. While the readiness for immediate use was moderate, with only 30.9% of exercises passing test cases, this reflects an inherent opportunity for educators to adapt content rather than adopt it wholesale.

The contextual and programming concept influence was notable, achieving a high degree of alignment with provided prompts. In practical terms, this indicates the potential for instructors to generate tailored problems with specific thematic or conceptual focus, enhancing engagement through relevance to students’ interests.

Codex's ability to generate code explanations showed coverage of the source material in 90% of cases, though accuracy was a limiting factor with only 67.2% of explanation lines being correct. Despite this, the results pave the way for Codex's use as a preliminary tool in understanding or debugging code, possibly within guided learning environments or alongside human assistance.

Implications and Future Directions

The implications of this work are twofold: practical and educational. Practically, the use of models like Codex can significantly reduce the overhead of content creation, afford diversified and personalized learning resources, and serve as an innovative tool for pedagogy in computer science education. Theoretically, this research enhances the discussion around machine learning's role in education, specifically as collaborative tools rather than autonomous systems.

A compelling avenue for future exploration lies in adaptive learning systems where student-sourced keywords generate individual learning paths. Additionally, expanding the scope of exercises to encompass complex programming concepts and larger project-based learning could further harness Codex's capabilities.

Given the rapid development in natural language and code generation, the continuous refinement of models like Codex will likely bridge many of the existing gaps. This ongoing progression necessitates a strategic integration of such technologies, balancing automation with critical instructional design principles to effectively support and enhance the teaching and learning ecosystem.