Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Just-in-Time Code Duplicates Extraction (2302.03416v1)

Published 7 Feb 2023 in cs.SE

Abstract: Refactoring is a critical task in software maintenance, and is usually performed to enforce better design and coding practices, while coping with design defects. The Extract Method refactoring is widely used for merging duplicate code fragments into a single new method. Several studies attempted to recommend Extract Method refactoring opportunities using different techniques, including program slicing, program dependency graph analysis, change history analysis, structural similarity, and feature extraction. However, irrespective of the method, most of the existing approaches interfere with the developer's workflow: they require the developer to stop coding and analyze the suggested opportunities, and also consider all refactoring suggestions in the entire project without focusing on the development context. To increase the adoption of the Extract Method refactoring, in this paper, we aim to investigate the effectiveness of machine learning and deep learning algorithms for its recommendation while maintaining the workflow of the developer. The proposed approach relies on mining prior applied Extract Method refactorings and extracting their features to train a deep learning classifier that detects them in the user's code. We implemented our approach as a plugin for IntelliJ IDEA called AntiCopyPaster. To develop our approach, we trained and evaluated various popular models on a dataset of 18,942 code fragments from 13 Open Source Apache projects. The results show that the best model is the Convolutional Neural Network (CNN), which recommends appropriate Extract Method refactorings with an F-measure of 0.82. We also conducted a qualitative study with 72 developers to evaluate the usefulness of the developed plugin. The results show that developers tend to appreciate the idea of the approach and are satisfied with various aspects of the plugin's operation.

Citations (10)

Summary

  • The paper presents a machine learning-based approach that leverages 78 features and a CNN model to recommend Extract Method refactorings with an F-measure of 0.82.
  • The methodology integrates data mining from 13 Apache projects to enable just-in-time extraction of duplicate code within developers’ workflows.
  • The implemented IntelliJ IDEA plugin, AntiCopyPaster, achieved a 93% acceptance rate among 72 developers, underscoring its practicality and usability.

Overview of "Just-in-Time Code Duplicates Extraction"

The paper "Just-in-Time Code Duplicates Extraction" presents a methodology and tool aimed at enhancing software maintenance through automated refactoring. The focus is on the Extract Method refactoring, a common technique used to consolidate duplicate code into a single method, thus improving code quality and reducing maintenance effort.

Context and Objective

The context outlined addresses the challenges associated with code duplication, a prevalent issue in software projects. Duplication complicates maintenance, leading to potential bug propagation and increased complexity. Traditional approaches to refactoring, while effective, often disrupt a developer’s workflow by requiring consideration of refactoring opportunities across the entire codebase.

The primary objective of this paper is to facilitate the adoption of Extract Method refactoring through a machine learning-based approach that operates seamlessly within the developer's workflow. The proposed solution is designed to detect and recommend refactoring opportunities in real-time, specifically targeting duplicate code as it is introduced.

Methodology

The approach employs a combination of prior refactoring data mining and machine learning to automate the process. Key steps include:

  1. Data Collection: The authors collected data from 13 mature Open Source Apache projects, using RefactoringMiner to extract instances of the Extract Method refactoring.
  2. Feature Extraction: A comprehensive set of 78 structural and semantic metrics is computed for each code fragment, serving as input features for a machine learning classifier.
  3. Model Training: A Convolutional Neural Network (CNN) was identified as the most effective model, outperforming others such as Random Forests, Support Vector Machines, and Naive Bayes. The CNN achieved an F-measure of 0.82 in recommending appropriate refactoring opportunities.
  4. Tool Implementation: The methodology was implemented as an IntelliJ IDEA plugin, AntiCopyPaster, which alerts developers of duplicate code snippets and suggests extraction just-in-time.

Results and Evaluation

The empirical evaluation was conducted across two dimensions: correctness and usefulness. Statistical tests demonstrated the superiority of the CNN in recommending Extract Method refactoring over other models. Additionally, a qualitative paper involving 72 developers showed positive reception, with the majority finding the plugin beneficial and user-friendly.

The results underscore the potential of integrating just-in-time recommendations within an IDE, facilitating more intuitive and timely refactoring decisions. Notably, 93% of the suggested refactorings were accepted by participants, indicating a high level of trust and practicality.

Implications and Future Directions

The research offers significant implications for both theory and practice. By embedding refactoring recommendations into the development process, the tool enhances efficiency and code quality without disrupting the developer. This approach could stimulate further enhancements in IDE functionalities, potentially extending to other refactoring types and programming languages.

For future developments, addressing feedback regarding user interface customization, such as notification controls and enhanced method naming conventions, could improve adoption. Extending this model to analyze project-wide code duplication could provide comprehensive refactoring strategies.

In conclusion, the paper contributes an effective tool and methodology for improving software quality through intelligent, context-aware refactoring, aligning with modern development practices.