Learning Syntactic Program Transformations from Examples (1608.09000v1)

Published 31 Aug 2016 in cs.SE, cs.LG, and cs.PL

Abstract: IDEs, such as Visual Studio, automate common transformations, such as Rename and Extract Method refactorings. However, extending these catalogs of transformations is complex and time-consuming. A similar phenomenon appears in intelligent tutoring systems where instructors have to write cumbersome code transformations that describe "common faults" to fix similar student submissions to programming assignments. We present REFAZER, a technique for automatically generating program transformations. REFAZER builds on the observation that code edits performed by developers can be used as examples for learning transformations. Example edits may share the same structure but involve different variables and subexpressions, which must be generalized in a transformation at the right level of abstraction. To learn transformations, REFAZER leverages state-of-the-art programming-by-example methodology using the following key components: (a) a novel domain-specific language (DSL) for describing program transformations, (b) domain-specific deductive algorithms for synthesizing transformations in the DSL, and (c) functions for ranking the synthesized transformations. We instantiate and evaluate REFAZER in two domains. First, given examples of edits used by students to fix incorrect programming assignment submissions, we learn transformations that can fix other students' submissions with similar faults. In our evaluation conducted on 4 programming tasks performed by 720 students, our technique helped to fix incorrect submissions for 87% of the students. In the second domain, we use repetitive edits applied by developers to the same project to synthesize a program transformation that applies these edits to other locations in the code. In our evaluation conducted on 59 scenarios of repetitive edits taken from 3 C# open-source projects, REFAZER learns the intended program transformation in 83% of the cases.

Citations (218)

View on Semantic Scholar

Summary

The paper presents Refazer, which learns generalized program transformations from code edit examples using a novel DSL built on PROSE.
It demonstrates 87% effectiveness in correcting student code and 83% accuracy in industrial C# projects with minimal examples.
The approach highlights significant potential for automating code refactoring and feedback generation to enhance software productivity.

Learning Syntactic Program Transformations from Examples

The paper presents a notable contribution to the field of automated program transformation by introducing Refazer, a technique designed to synthesize program transformations from examples. The innovation capitalizes on the repetitive nature of code edits performed by developers and students in academic settings, positing these as effective input-output examples for learning generalized program transformations. Refazer is constructed on top of PROSE, a sophisticated program synthesis system renowned for its utilization of Programming-by-Examples (PBE) methodologies.

Key Components and Methodology

Refazer distinguishes itself by introducing a novel Domain-Specific Language (DSL) for characterizing program transformations. The DSL is adept at capturing a wide array of transformation patterns that transcend specific variable names and concrete syntax, focusing instead on abstract patterns that correlate with a variety of program contexts. This abstraction is critical as it permits the learning of transformations that apply to analogous patterns across disparate code bases.

The core mechanism for learning transformations involves leveraging a set of domain-specific synthesis algorithms paired with functions that rank the synthesized transformations. The ranking criterion is crucial as it dictates the selection of transformations that balance between over-generalization and over-specialization, thus optimizing their practical applicability.

Experimental Evaluation

The paper evaluates Refazer within two distinct contexts: educational settings (specifically introductory programming courses) and industrial open-source C# codebases. In educational settings, Refazer demonstrates its capability to fix 87% of students’ submissions in a paper conducted across four programming tasks among 720 students. This result underscores the technique's effectiveness in detecting and correcting common code faults among novice programmers.

In the domain of industrial codebases, Refazer performs impressively by learning transformations that correctly address repetitive editing tasks in 83% of examined cases across three large-scale C# projects, requiring only 2.8 examples on average. These are significant results, highlighting the potential of Refazer to augment development workflows by automating tedious and error-prone manual edits typically performed during software evolution processes.

Implications and Future Directions

Refazer’s successful application suggests several implications and potential avenues for future research. Practically, tools like Refazer can significantly reduce the burden on educators by automatically generating feedback or hints for students, thus scaling educational support in courses with large enroLLMents. Furthermore, in software development, Refazer presents a pathway to automating and optimizing refactoring and code maintenance tasks, increasing both developer productivity and code quality.

Theoretically, the work advances the field of automatic program transformation by illustrating the practical integration of PBE methodologies with a new DSL tailored for program transformations. This suggests a broader applicability of induction-based synthesis techniques beyond traditional domains such as string manipulation and data wrangling, potentially stimulating further research into expanding the expressiveness of transformation DSLs and synthesizing context-aware transformations that include data-flow analyses.

As future research avenues, enhancing the current DSL’s capability to handle more complex transformations through integration with semantic analyses could further refine the system's applicability and reliability. Additionally, exploring interactive refinement mechanisms to involve developers and educators in the transformation synthesis process could bridge the gap between fully automated systems and human-assisted coding environments.

In summary, the paper makes a substantial contribution to automated software engineering, providing empirical evidence of the efficacy of automated program transformations in syntactic domains and setting a precedent for future research in learning-based code synthesis. Refazer exemplifies the integration of theoretical advancements with practical tool development, offering tangible improvements to both educational and industrial programming workflows.

PDF Markdown