- The paper presents a machine learning-based approach that leverages 78 features and a CNN model to recommend Extract Method refactorings with an F-measure of 0.82.
- The methodology integrates data mining from 13 Apache projects to enable just-in-time extraction of duplicate code within developers’ workflows.
- The implemented IntelliJ IDEA plugin, AntiCopyPaster, achieved a 93% acceptance rate among 72 developers, underscoring its practicality and usability.
Overview of "Just-in-Time Code Duplicates Extraction"
The paper "Just-in-Time Code Duplicates Extraction" presents a methodology and tool aimed at enhancing software maintenance through automated refactoring. The focus is on the Extract Method refactoring, a common technique used to consolidate duplicate code into a single method, thus improving code quality and reducing maintenance effort.
Context and Objective
The context outlined addresses the challenges associated with code duplication, a prevalent issue in software projects. Duplication complicates maintenance, leading to potential bug propagation and increased complexity. Traditional approaches to refactoring, while effective, often disrupt a developer’s workflow by requiring consideration of refactoring opportunities across the entire codebase.
The primary objective of this paper is to facilitate the adoption of Extract Method refactoring through a machine learning-based approach that operates seamlessly within the developer's workflow. The proposed solution is designed to detect and recommend refactoring opportunities in real-time, specifically targeting duplicate code as it is introduced.
Methodology
The approach employs a combination of prior refactoring data mining and machine learning to automate the process. Key steps include:
- Data Collection: The authors collected data from 13 mature Open Source Apache projects, using RefactoringMiner to extract instances of the Extract Method refactoring.
- Feature Extraction: A comprehensive set of 78 structural and semantic metrics is computed for each code fragment, serving as input features for a machine learning classifier.
- Model Training: A Convolutional Neural Network (CNN) was identified as the most effective model, outperforming others such as Random Forests, Support Vector Machines, and Naive Bayes. The CNN achieved an F-measure of 0.82 in recommending appropriate refactoring opportunities.
- Tool Implementation: The methodology was implemented as an IntelliJ IDEA plugin, AntiCopyPaster, which alerts developers of duplicate code snippets and suggests extraction just-in-time.
Results and Evaluation
The empirical evaluation was conducted across two dimensions: correctness and usefulness. Statistical tests demonstrated the superiority of the CNN in recommending Extract Method refactoring over other models. Additionally, a qualitative paper involving 72 developers showed positive reception, with the majority finding the plugin beneficial and user-friendly.
The results underscore the potential of integrating just-in-time recommendations within an IDE, facilitating more intuitive and timely refactoring decisions. Notably, 93% of the suggested refactorings were accepted by participants, indicating a high level of trust and practicality.
Implications and Future Directions
The research offers significant implications for both theory and practice. By embedding refactoring recommendations into the development process, the tool enhances efficiency and code quality without disrupting the developer. This approach could stimulate further enhancements in IDE functionalities, potentially extending to other refactoring types and programming languages.
For future developments, addressing feedback regarding user interface customization, such as notification controls and enhanced method naming conventions, could improve adoption. Extending this model to analyze project-wide code duplication could provide comprehensive refactoring strategies.
In conclusion, the paper contributes an effective tool and methodology for improving software quality through intelligent, context-aware refactoring, aligning with modern development practices.