Learning Program Embeddings to Propagate Feedback on Student Code (1505.05969v1)

Published 22 May 2015 in cs.LG, cs.NE, and cs.SE

Abstract: Providing feedback, both assessing final work and giving hints to stuck students, is difficult for open-ended assignments in massive online classes which can range from thousands to millions of students. We introduce a neural network method to encode programs as a linear mapping from an embedded precondition space to an embedded postcondition space and propose an algorithm for feedback at scale using these linear maps as features. We apply our algorithm to assessments from the Code.org Hour of Code and Stanford University's CS1 course, where we propagate human comments on student assignments to orders of magnitude more submissions.

Citations (177)

View on Semantic Scholar

Summary

Learning Program Embeddings to Propagate Feedback on Student Code

The paper presents a methodology for encoding student-submitted programs to efficiently propagate instructional feedback within large-scale online computer science courses, such as those offered by Code.org and Stanford University. This is achieved by introducing a novel neural network architecture that simultaneously optimizes the feature space representation of programs and their associated memory states. The approach leverages the linear transformation of embedded precondition states to embedded postcondition states to provide rapid feedback using human annotations.

Core Contributions and Methodology

The authors present three main contributions:

Program Feature Embeddings: The paper designs a framework to automatically derive program feature embeddings that capture both functional and stylistic attributes of student submissions. These embeddings transform the student programs into matrices representing linear mappings between feature spaces of program states, which are preconditions and postconditions obtained during program execution.
Feedback Propagation: Utilizing the learnt embeddings, the paper implements a feedback propagation algorithm that autonomously extends teacher comments from a small graded subset to a much larger number of student submissions. This addresses the primary challenge of personalized feedback in large-scale educational environments.
Empirical Evaluation: The methodology's effectiveness is demonstrated using datasets from Code.org's Hour of Code and Stanford University's CS1 programming course. The results indicate a substantial amplification of teacher feedback propagation with high precision, showcasing its viability in educational applications.

Findings and Implications

This paper provides insights into the automatic featurization of student code submissions, enabling efficient feedback dissemination. Feedback propagation experiments reveal substantial force multiplication, with evidence indicating that learned embeddings can predict student program functionality and propagate nuanced comments to ungraded submissions. The application of these methods could reduce instructional resource allocation significantly and enable scalable feedback mechanisms in massive open online courses (MOOCs).

Theoretical and Practical Implications

Theoretical: This research contributes to machine learning literature by illuminating ways to represent code using feature embeddings aligned with domain-specific semantics. It advances understanding of how neural networks can encapsulate semantic patterns in executable code, similar to embeddings utilized in NLP tasks.
Practical: In the education sector, the proposed system could transform how feedback is administered in programming classes, ensuring students receive critical feedback even in resource-constrained environments. The methodology holds potential for broader applications in other domains that involve program analysis and automatic feedback systems.

Future Directions

The findings invite future explorations into more complex, variable-rich programming tasks and scalability across varied educational contexts. Integrating adaptive learning systems that dynamically adjust to student needs based on feedback could further harness the potential of these program embeddings. Moreover, expanding compatibility with diverse programming languages and problem-solving domains will be vital for practical deployment.

Overall, the paper contributes valuable advancements in educational technology and computational learning strategies, offering promising insights into large-scale feedback systems through the lens of program embeddings.